Methods of locating and tracking robotic instruments in robotic surgical systems

ABSTRACT

In one embodiment of the invention, a method is disclosed to locate a robotic instrument in the field of view of a camera. The method includes capturing sequential images in a field of view of a camera. The sequential images are correlated between successive views. The method further includes receiving a kinematic datum to provide an approximate location of the robotic instrument and then analyzing the sequential images in response to the approximate location of the robotic instrument. An additional method for robotic systems is disclosed. Further disclosed is a method for indicating tool entrance into the field of view of a camera.

FIELD

The embodiments of the invention relate generally to robots and robotictools or instruments. More particularly, the embodiments of theinvention relate to the acquisition and tracking of the position andorientation of robotic tools or instruments.

BACKGROUND

Minimally invasive surgical (MIS) procedures have become more commonusing robotic (e.g., telerobotic) surgical systems. An endoscopic camerais typically used to provide images to a surgeon of the surgical cavityso that the surgeon can manipulate robotic surgical tools therein.However if the robotic surgical tool is not in the field of view of thecamera or it is otherwise hidden by tissue or other surgical tools, asurgeon may be left guessing how to move the robotic surgical tool whenit is obscured from his view.

Moreover, tissue or organs of interest in a surgical cavity are oftenobscured from view. A surgeon may have to initially guess the locationof an organ of interest within a surgical cavity and search aroundtherein to place the organ and the robotic surgical tools within a fieldview of the endoscopic camera.

To better localize a surgical tool in the field of view, opticaldevices, such as light emitting diodes, have been attached to roboticsurgical tools. However, optical devices can interfere with endoscopicsurgical procedures and may not provide sufficiently accurate positionand orientation information for a minimally invasive surgical system. Amagnetic device may be applied to a robotic surgical tool in an attemptto magnetically sense its location. However, robotic surgical tools areoften formed of metal and a magnetic device may not work well due to theinterference generated by the movement of metal-tools and electricalmotors in a minimally invasive surgical system. Moreover, these mayprovide only a single clue of the position of a robotic surgical tool.

BRIEF SUMMARY

The embodiments of the invention are summarized by the claims thatfollow below.

BRIEF DESCRIPTIONS OF THE DRAWINGS

FIG. 1A is a block diagram of a robotic medical system including astereo viewer and an image guided surgery (IGS) system with a tooltracking sub-system.

FIG. 1B is a block diagram of a patient side cart including roboticsurgical arms to support and move robotic instruments.

FIG. 2 is a functional block diagram of the video portion of the IGSsystem to provide a stereo image in both left and right video channelsto provide three-dimensional images in a stereo viewer.

FIG. 3 is a perspective view of a robotic surgical master controlconsole including a stereo viewer and an IGS system with tool trackingsub-system.

FIG. 4 is a perspective view of the stereo viewer of the roboticsurgical master control console.

FIG. 5A is a perspective view of a sequence of video frames includingvideo images of a robotic medical tool that may be used to perform tooltracking.

FIG. 5B illustrates different tool positions of a pair of tools in thefield of view of a camera based on kenematic information and video imageinformation.

FIG. 6A is a functional block diagram of a tool tracking architectureand methodology for a robotic system including one or more roboticinstruments.

FIG. 6B is a flow chart of a tool tracking library and its application.

FIG. 7 is a block diagram illustrating various techniques that may becombined together to meet the challenges in tool tracking.

FIG. 8 is a functional flow-chart of a tool tracking system.

FIG. 9A is a figure to illustrate the process of pure image segmentationto localize a tool within an image.

FIG. 9B is a figure to illustrate the process of sequence matchingand/or model-based synthesis to localize a tool within an image.

FIG. 9C is a more detailed figure to illustrate the process ofmodel-based synthesis to localize a tool within an image.

FIGS. 10A-10B illustrates elements of a state-space model to adaptivelyfuse robot kinematics information and vision-based information together.

FIG. 11A-C illustrate various image matching techniques that may be usedseparately or collectively to determine pose information of a tool.

FIG. 12A is a diagram illustrating adaptive fusion under differentviewing conditions.

FIG. 12B is a diagram illustrating a set up for parallel stereo.

FIGS. 12C-12E are charts illustrating various view geometry statisticsthat may be used to enhance the performance of the state space model foradaptively fusing information sources together.

FIGS. 13A-13B are diagrams illustrating sequence matching of a featurein a sequence of one or more images from different sources.

FIG. 14 is a diagram illustrating appearance learning of objects withinan image.

FIG. 15 is a flow chart of the application of tool tracking to imageguided surgery.

FIG. 16 is a perspective view of overlaying a pre-scanned image oftissue onto a depth map of a surgical site to provide image-guidedsurgery.

FIG. 17 is a perspective view of a surgical site with an ultrasound toolcapturing ultrasound images for overlay onto a display.

DETAILED DESCRIPTION

In the following detailed description of the embodiments of theinvention, numerous specific details are set forth in order to provide athorough understanding of the present invention. However, it will beobvious to one skilled in the art that the embodiments of the inventionmay be practiced without these specific details. In other instances wellknown methods, procedures, components, and circuits have not beendescribed in detail so as not to unnecessarily obscure aspects of theembodiments of the invention.

Introduction

Aspects of the invention include methods, apparatus and integratedsystems for tool acquisition (locating) and tool tracking(kinematics-tracking (pose predicting) and full-tracking) of roboticmedical tools. The method/system for tool tracking systematically andefficiently integrates robot kinematics and visual information to obtainpose (position/orientation) information, which can be used to obtain amore accurate pose of a robotic surgical tool than robot kinematics orvisual information alone, in either a camera coordinate system or a basecoordinate system. Known kinematics transformation can be applied to thepose correction to achieve improved pose in any related coordinatesystem. A camera coordinate system is a coordinate system based on achosen camera (for example, (X^(r), Y^(r), Z^(r)) in FIG. 12B), or acommon reference coordinate system for multiple cameras (for example,(X_(S), Y_(S), Z_(S)) in FIG. 12B). In some aspects, tool trackingexplores prior available information, such as the CAD models of tools,and dynamically learns the image appearances of the robotic instruments.In some aspects, tool tracking may be markerless so as not to interferewith normal robotic surgical procedures. Furthermore, tool tracking mayprovide continuous pose information of the robotic instruments includingtheir relationships (e.g. tool A is on top of tool B and hence partiallyoccluding tool B) with other tools so that image-based segmentation ofthe tools may be avoided.

Robotic Medical System

Referring now to FIG. 1A, a block diagram of a robotic surgery system100 is illustrated to perform minimally invasive robotic surgicalprocedures using one or more robotic arms 158. Aspects of system 100include telerobotic and autonomously operating features. These roboticarms often support a robotic instrument. For instance, a roboticsurgical arm (e.g., the center robotic surgical arm 158C) may be used tosupport a stereo or three-dimensional surgical image capture device 101Csuch as a stereo endoscope (which may be any of a variety of structuressuch as a stereo laparoscope, arthroscope, hysteroscope, or the like),or, optionally, some other imaging modality (such as ultrasound,fluoroscopy, magnetic resonance imaging, or the like). Robotic surgerymay be used to perform a wide variety of surgical procedures, includingbut not limited to open surgery, neurosurgical procedures (e.g.,stereotaxy), endoscopic procedures (e.g., laparoscopy, arthroscopy,thoracoscopy), and the like.

A user or operator O (generally a surgeon) performs a minimally invasivesurgical procedure on patient P by manipulating control input devices160 at a master control console 150. A computer 151 of the console 150directs movement of robotically controlled endoscopic surgicalinstruments 101A-101C via control lines 159, effecting movement of theinstruments using a robotic patient-side system 152 (also referred to asa patient-side cart).

The robotic patient-side system 152 includes one or more robotic arms158. Typically, the robotic patient-side system 152 includes at leastthree robotic surgical arms 158A-158C (generally referred to as roboticsurgical arms 158) supported by corresponding positioning set-up arms156. The central robotic surgical arm 158C may support an endoscopiccamera 101C. The robotic surgical arms 158A and 158B to the left andright of center may support robotic instruments 101A and 101B,respectively, that may manipulate tissue.

Robotic instruments are generally referred to herein by the referencenumber 101. Robotic instruments 101 may be any instrument or tool thatcouples to a robotic arm that can be manipulated thereby and can reportback kinematics information to the robotic system. Robotic instrumentsinclude, but are not limited to, surgical tools, medical tools,bio-medical tools, and diagnostic instruments (ultrasound, computertomography (CT) scanner, magnetic resonance imager (MRI)).

Generally, the robotic patient-side system 152 includes a positioningportion and a driven portion. The positioning portion of the roboticpatient-side system 152 remains in a fixed configuration during surgerywhile manipulating tissue. The driven portion of the roboticpatient-side system 152 is actively articulated under the direction ofthe operator O generating control signals at the surgeon's console 150during surgery. The driven portion of the robotic patient-side system152 may include, but is not limited or restricted to robotic surgicalarms 158A-158C.

The instruments 101, the robotic surgical arms 158A-158C, and the set upjoints 156,157 may include one or more displacement transducers,positional sensors, and/or orientational sensors 185,186 to assist inacquisition and tracking of robotic instruments. From instrument tip toground (or world coordinate) of the robotic system, the kinematicsinformation generated by the transducers and the sensors in the roboticpatient-side system 152 is reported back to the robotic system and atool tracking and image guided surgery (IGS) system 351.

As an exemplary embodiment, the positioning portion of the roboticpatient-side system 152 that is in a fixed configuration during surgerymay include, but is not limited or restricted to set-up arms 156. Eachset-up arm 156 may include a plurality of links and a plurality ofjoints. Each set-up arm may mount via a first set-up-joint 157 to thepatient side system 152.

An assistant A may assist in pre-positioning of the robotic patient-sidesystem 152 relative to patient P as well as swapping tools orinstruments 101 for alternative tool structures, and the like, whileviewing the internal surgical site via an external display 154. Theexternal display 154 or another external display 154 may be positionedor located elsewhere so that images of the surgical site may bedisplayed to students or other interested persons during a surgery.Images with additional information may be overlaid onto the images ofthe surgical site by the robotic surgical system for display on theexternal display 154.

Referring now to FIG. 1B, a perspective view of the robotic patient-sidesystem 152 is illustrated. The robotic patient-side system 152 comprisesa cart column 170 supported by a base 172. One or more robotic surgicalarms 158 are respectively attached to one or more set-up arms 156 thatare a part of the positioning portion of robotic patient-side system152. Situated approximately at a central location on base 172, the cartcolumn 170 includes a protective cover 180 that protects components of acounterbalance subsystem and a braking subsystem (described below) fromcontaminants.

Excluding a monitor arm 154, each robotic surgical arm 158 is used tocontrol robotic instruments 101A-101C. Moreover, each robotic surgicalarm 158 is coupled to a set-up arm 156 that is in turn coupled to acarriage housing 190 in one embodiment of the invention, as describedbelow with reference to FIG. 3. The one or more robotic surgical arms158 are each supported by their respective set-up arm 156, as isillustrated in FIG. 1B.

The robotic surgical arms 158A-158D may each include one or moredisplacement transducers, orientational sensors, and/or positionalsensors 185 to generate raw uncorrected kinematics data, kinematicsdatum, and/or kinematics information to assist in acquisition andtracking of robotic instruments. The robotic instruments may alsoinclude a displacement transducer, a positional sensor, and/ororientation sensor 186 in some embodiments of the invention. Moreover,one or more robotic instruments may include a marker 189 to assist inacquisition and tracking of robotic instruments.

Endoscopic Video System

Referring now to FIG. 2, the stereo endoscopic camera 101C includes anendoscope 202 for insertion into a patient, a camera head 204, a leftimage forming device (e.g., a charge coupled device (CCD)) 206L, arightimage forming device 206R, a left camera control unit (CCU) 208L, and aright camera control unit (CCU) 208R coupled together as shown. Thestereo endoscopic camera 101C generates a left video channel 220L and aright video channel 220R of frames of images of the surgical sitecoupled to a stereo display device 164 through a video board 218. Toinitially synchronize left and right frames of data, a lock referencesignal is coupled between the left and right camera control units208L,208R. The right camera control unit generates the lock signal thatis coupled to the left camera control unit to synchronize the left viewchannel to the right video channel. However, the left camera controlunit 208L may also generates the lock reference signal so that the rightvideo channel synchronizes to the left video channel.

The stereo display 164 includes a left monitor 230L and a right monitor230R. As discussed further herein with reference to FIG. 4, theviewfinders or monitors 230L,230R may be provided by a left displaydevice 402L and a right display device 402R, respectively. The stereoimages may be provided in color by a pair of color display devices402L,402R.

Additional details of a stereo endoscopic camera and a stereo displaymay be found in U.S. Pat. No. 5,577,991 entitled “Three DimensionalVision Endoscope with Position Adjustment Means for Imaging Device andVisual Field Mask” filed on Jul. 7, 1995 by Akui et al; U.S. Pat. No.6,139,490 entitled “Stereoscopic Endoscope with Virtual Reality Viewing”filed on Nov. 10, 1997 by Breidenthal et al; and U.S. Pat. No. 6,720,988entitled “Stereo Imaging System and Method for use in TeleroboticSystems” filed on Aug. 20, 1999 by Gere et al.; all of which areincorporated herein by reference. Stereo images of a surgical site maybe captured by other types of endoscopic devices and cameras withdifferent structures. For example, a single optical channel may be usedwith a pair of spatially offset sensors to capture stereo images of thesurgical site.

Referring now to FIG. 3, a perspective view of the robotic surgicalmaster control console 150 is illustrated. The master control console150 of the robotic surgical system 100 may include the computer 151, abinocular or stereo viewer 312, an arm support 314, a pair of controlinput wrists and control input arms in a workspace 316, foot pedals 318(including foot pedals 318A-318B), and a viewing sensor 320. The mastercontrol console 150 may further include a tool tracking and image guidedsurgery system 351 coupled to the computer 151 for providing the toolimages and tissue images overlaid on the visible surgical site images.Alternatively, the tool tracking and image guided surgery system 351 maybe located elsewhere in the robotic surgical system 100, such as thepatient side cart 152 or a separate computer system.

The stereo viewer 312 has two displays where stereo three-dimensionalimages of the surgical site may be viewed to perform minimally invasivesurgery. When using the master control console, the operator O typicallysits in a chair, moves his or her head into alignment with the stereoviewer 312 to view the three-dimensional annotated images of thesurgical site. To ensure that the operator is viewing the surgical sitewhen controlling the robotic instruments 101, the master control console150 may include the viewing sensor 320 disposed adjacent the binoculardisplay 312. When the system operator aligns his or her eyes with thebinocular eye pieces of the display 312 to view a stereoscopic image ofthe surgical worksite, the operator's head sets off the viewing sensor320 to enable the control of the robotic instruments 101. When theoperator's head is removed the area of the display 312, the viewingsensor 320 can disable or stop generating new control signals inresponse to movements of the touch sensitive handles in order to holdthe state of the robotic instruments. Alternatively, the processingrequired for tool tracking and image guided surgery may be entirelyperformed using computer 151 given a sufficiently capable computingplatform.

The arm support 314 can be used to rest the elbows or forearms of theoperator O (typically a surgeon) while gripping touch sensitive handlesof the control input wrists, one in each hand, in the workspace 316 togenerate control signals. The touch sensitive handles are positioned inthe workspace 316 disposed beyond the arm support 314 and below theviewer 312. This allows the touch sensitive handles to be moved easilyin the control space 316 in both position and orientation to generatecontrol signals. Additionally, the operator O can use his feet tocontrol the foot-pedals 318 to change the configuration of the surgicalsystem and generate additional control signals to control the roboticinstruments 101 as well as the endoscopic camera.

The computer 151 may include one or more microprocessors 302 to executeinstructions and a storage device 304 to store software with executableinstructions that may be used to generate control signals to control therobotic surgical system 100. The computer 151 with its microprocessors302 interprets movements and actuation of the touch sensitive handles(and other inputs from the operator O or other personnel) to generatecontrol signals to control the robotic surgical instruments 101 in thesurgical worksite. In one embodiment of the invention, the computer 151and the stereo viewer 312 map the surgical worksite into the controllerworkspace 316 so it feels and appears to the operator that the touchsensitive handles are working over the surgical worksite. The computer151 may couple to the tool tracking and image guided surgery system 351to execute software and perform computations for the elements of theimage guided surgery unit.

The tool tracking system described herein may be considered as operatingin an open-loop fashion if the surgeon operating the master console isnot considered part of the system. If the robotic instrument is to beautomatically controlled with the tool tracking system, such as invisual serving systems used to control the pose of a robot'send-effector using visual information extracted from images, the tooltracking system may be considered to be operating in a closedvisual-feedback loop.

Referring now to FIG. 4, a perspective view of the stereo viewer 312 ofthe master control console 150 is illustrated. To provide athree-dimensional perspective, the viewer 312 includes stereo images foreach eye including a left image 400L and a right image 400R of thesurgical site including any robotic instruments 400 respectively in aleft viewfinder 401L and a right viewfinder 401R. The images 400L and400R in the viewfinders may be provided by a left display device 402Land a right display device 402R, respectively. The display devices402L,402R may optionally be pairs of cathode ray tube (CRT) monitors,liquid crystal displays (LCDs), or other type of image display devices(e.g., plasma, digital light projection, etc.). In the preferredembodiment of the invention, the images are provided in color by a pairof color display devices 402L,402R; such as color CRTs or color LCDs.

In the stereo viewer, three dimensional maps (a depth map with respectto a camera coordinate system or equivalently a surface map of an objectwith respect to its local coordinate system is a plurality ofthree-dimensional points to illustrate a surface in three dimensions) ofthe anatomy, derived from alternative imaging modalities (e.g. CT scan,XRAY, or MRI), may also be provided to a surgeon by overlaying them ontothe video images of the surgical site. In the right viewfinder 401R, aright image 410R rendered from a three dimensional map such as from a CTscan, may be merged onto or overlaid on the right image 400R beingdisplayed by the display device 402R. In the left viewfinder 401L, arendered left image 410L is merged into or overlaid on the left image400L of the surgical site provided by the display device 402L. In thismanner, a stereo image may be displayed to map out organ location ortissue location information in the surgical site to the operator O inthe control of the robotic instruments in the surgical site, augmentingthe operator's view of the surgical site with information that may notbe directly available or visible by an endoscopic camera in the surgicalsite.

While a stereo video endoscopic camera 101C has been shown anddescribed, a mono video endoscopic camera generating a single videochannel of frames of images of the surgical site may also be used in anumber of embodiments of the invention. Rendered images can also beoverlaid onto the frames of images of the single video channel.

Tool Tracking

Tool tracking has a number of applications in robotic surgical systems.One illustrative application of tool tracking is to automaticallycontrol the motion of the endoscopic camera so that a surgeonautomatically views regions of interest in a surgical site, thus freeingthe surgeon from the camera control task. For example, roboticinstruments are tracked so that the endoscopic camera is centered in thefield of view of the surgical site. Another illustrative application foraccurate tool tracking may be used to move a robotic instrument to reacha surgical target (e.g., a tumor) either automatically or by a surgeon.For a target such as a tumor that may be occluded, other real-timeimaging modalities, such as ultra-sound or pre-scanned images, may beused with real-time tool tracking to move a robotic instrument to thetumor and remove it. Other illustrative applications of tool trackinginclude a graphic user interface (GUI) that facilities the entrance andre-entrance of robotic instrument during surgery. Tool tracking is veryuseful when robotic instruments are not in the field of view of theendoscopic camera or are otherwise obscured in the field of view of thecamera. In such scenarios, robotic kinematics provides informationthrough the proposed state-space model.

In one embodiment of the invention, a tool tracking system andarchitecture is provided that fully integrates kinematics informationand visual information for robust and accurate tool trackingperformance. Results of tool localization and tracking are made accurateand reliable by adaptively combining together robust kinematicsinformation and accurate geometric information derived from video. Thetool tracking system performs locating (determining absolutelocations/poses with stereo video), tracking (integrating visual andkinematics) and predicting (kinematics while the tool or a portionthereof is not visible) functions.

Technical capabilities in the tool tracking system include ananalysis-by-synthesis for image matching and a sequential Bayesianapproach which fuses together visual and kinematic information.

An analysis-by-synthesis capability makes it possible to explore theprior information that is of concern, such as information about thetools and not the tissue and surrounding environment. The basicprocedure in analysis-by-synthesis is to synthesize an image based onthe model (geometry and texture) and the current pose (position/locationand orientation) of a tool and then compare it against real images. Theerror between the real and synthesized images is the driving force forbetter estimation of tool pose. To make this approach more robust, e.g.,handling varying illumination and the natural wear of a tool;appearance-based learning may be applied to update the model for aspecific tool. Alternatively, matching may be performed using features,such as edges and/or corners, that are more robust to lightingvariation.

To obtain the absolute pose (location and orientation), stereo imagingmay be used along with the analysis-by-synthesis (ABS) techniques.Instead of just a feature point based stereo that may require markers, astereo approach may be provided based on the tool (or some of itsparts). To further improve the robustness, stereo and ABS techniques maybe applied to a sequence of images (e.g., the same location anddifferent orientations or different locations and differentorientation). Sequence-based matching makes the procedure lessvulnerable to local minimum in the process of optimization/estimation.

A sequential Bayesian approach may be applied to fuse visual informationand robot kinematics to efficiently track tools. In this approach, thestates provide zero-order kinematics of the tools (e.g. current positionand orientation, or pose), and the first-order kinematics of the tools(e.g. translational and angular velocity). Depending upon the complexityof the underlying physical system, higher-order or lower-order statespace model may be adopted. For a linear space model, a linear Kalmanfiltering may be applied when the noise can be approximated as Gaussian.For a nonlinear state space model, an extended Kalman filtering may beused to filter out noise from observations and state dynamics. For acomplex state space model that requires a point distribution function(pdf), sequential Monte Carlo method (e.g., particle filtering) can beapplied where the point distribution function may be represented by aset of representative samples.

One challenge in fusing information from the various sources is that thesources should be correctly characterized, for example, in terms ofobservation noise characteristics. Taking the simplest case of Gaussiannoises that can be characterized by means and co-variance, we canestimate the co-variance matrices quite robustly since we know thegeometry of the robotic instruments (location and orientation) and theirrelative positions. For example, observation (e.g., pixel location of afeature point) from a robotic instrument that is under a slanted viewingangle or is occluded by other tools has a large variance.

Referring now to FIG. 5A, tool tracking involves determining a pose of arobotic instrument 101 including its position or location (Xt,Yt,Zt) andits orientation in the camera coordinate system as it moves in, around,and out of the surgical site. A full pose description may not onlyinclude the location and orientation of a robotic instrument in a threedimensional space but may further include the pose of an end-effector,if any. Positional information or pose as used herein may be used torefer to one or both the location and orientation of a roboticinstrument.

A sequence of left video frames 500L within a camera coordinate systemand a sequence of right video frames 500R within another cameracoordinate system (or one pair of video frames/images from left andrights views) may be used for tool tracking in one embodiment of theinvention. Alternatively, a single view with a single sequence of videoframes may be used for tool tracking in another embodiment of theinvention. Alternatively, a single video frame may be used for tooltracking in yet another embodiment of the invention providing partiallycorrected pose estimates.

A marker 502 on the robotic instrument 101 may be used to assist in thetool tracking if visible or otherwise sensible. In one embodiment of theinvention, the marker 502 is a painted marker minimally altering therobotic instruments. In other embodiments of the invention, markerlesstool tracking is provided with no modification of the roboticinstruments. For example, natural image features of a robotic tool maybe detected as natural markers and/or image appearance of the tools andthe CAD model of tools may be used to provide tool tracking.

In one embodiment of the invention, video information from an endoscopiccamera and kinematics of the robotic arm and robotic instrument are usedas cues to determine the pose of the robotic instrument in the surgicalsite. If the robotic instrument is not visible to the camera, the videoinformation alone is insufficient to determine position but thekinematics adds the missing information to determine robotic instrumentpose. Moreover, even if the video information is available, the additionof the kinematics information makes the computation of pose more robust.A tool tracking system is provided based on robot kinematics and visionthat requires little to no modifications to the robotic instruments 101and the pre-existing surgical system 100.

Kinematics information provided by the surgical system 100 may includekinematic position k_(t) ^(P), kinematic orientation k_(t) ^(Ω),kinematic linear velocity {dot over (k)}_(t) ^(P), and kinematic angularvelocity {dot over (k)}_(t) ^(Ω) of one or more robotic instruments 101.The kinematics information may be the result of movement of the roboticsurgical arm 158, the robotic instrument 101, or both the roboticsurgical arm 158 and robotic instrument 101 at a given time. Thekinematics information provided by the surgical system 100 may alsoinclude the kinematic position k_(t) ^(P), kinematic orientation k_(t)^(Ω), kinematic linear velocity {dot over (k)}_(t) ^(P), and kinematicangular velocity {dot over (k)}_(t) ^(Ω) of the endoscopic camera toprovide a frame of reference.

Referring now to FIG. 6A, a functional block diagram of a tool trackingarchitecture and methodology for a surgical system is illustrated inaccordance with embodiments of the invention. The main operationalstages of tool tracking are illustrated in the middle column of FIG. 6A.Key technical capabilities associated with tool tracking are illustratedin the left column of FIG. 6A but for operational constraints 601. Theend results of the tool tracking methodology are illustrated in theright column of FIG. 6A.

The key technical components of the methodology and architecture may befurther categorized into basic building blocks and supporting blocks.The basic building blocks including image matching 609 and a state-spacemodel 613 that are used to provide efficient tool tracking, each ofwhich are responsive to visual information. The supporting blocksinclude model-based synthesis 611, adaptive fusion 615, and sequencematching 607 to support the implementation of robust and accurate tooltracking.

Adaptive fusion 615 fully explores prior information that may beavailable, including prior kinematics information and prior visualinformation.

In a robotic surgery system, vision information and kinematicsinformation are typically available with known characteristics. Robotkinematics information is usually stable and often accurate but maydrift during long periods of time. Vision-based information is veryaccurate when it can be reliably estimated. Otherwise vision-basedinformation may be very inaccurate.

In embodiments of the invention, adaptive fusion is used to obtainaccurate information fusion from different sources of information aswell as similar sources of information. With adaptive fusion, if thevision-based information is known to be accurate then the informationfusion is heavily biased towards the vision-based information. If thevision-based information is known to be unreliable or inaccurate, robotkinematics information is used over the vision-based information togenerate a more robust fusion of information. While the quality of robotkinematics is typically uniform, the quality of vision information interms of image matching and 3D post estimation varies a lot. Viewgeometry statistics may be used to determine the reliability andaccuracy of video-based information. Adaptive fusion may also be used toobtain accurate information fusion from similar sources of information.

Model-based synthesis is used herein to generally refer to generation orrendering of a template image for use in subsequent matching operations,and includes full synthesis, geometry only synthesis, and implicitsynthesis. Full synthesis, as the name implies, is a complete synthesisof an image of the robotic instrument. For example, robotic instrumentimages are generated from a computer aided design (CAD) model based onits geometry and texture. Other prior information (thelocation/orientation of the model), not necessarily accurate, ispresented along with the synthesized robotic instrument images for imagematching 609. Geometry-only synthesis is the case where the geometry ofthe robotic instrument is used to synthesize geometry-only images (e.g.,edge images). Texture of the model is not used in geometry-onlysynthesis. Implicit synthesis is the case where images are not actuallysynthesized. Instead the model (either geometry or texture or both) isimplicitly used to perform image matching. For example, the geometricproperties (e.g., width, length, shape) of a marker and the geometricrelationship among them (e.g., markers forming a line) when available ina marker-based tracking system may be used to improve image matching.

In one embodiment of the invention, sequence matching is where objectsor features in a sequence of images captured from one camera view arematched against objects or features in a sequence of images capturedfrom a different camera view. In another embodiment of the invention,sequence matching is where objects or features in a sequence of imagesfrom a camera view are matched against objects or features in a sequenceof synthesized images.

There are two main operational stages in a tool tracking system. The twomain operational stages are tool acquisition 604 and tool tracking 606.Tool acquisition may also be referred to as localization herein.

The goal of the tool acquisition stage 604 is to obtain the absolutepose information (location and orientation) of the one or more roboticinstruments within the field of view of one or more cameras, such as thestereo endoscopic camera 101C. The tool acquisition stage 604 performs alocating function 614 resulting in the location and orientation of thetool.

The goal of the tool tracking stage 606 is to dynamically update theabsolute pose (location and orientation) of a moving robotic instrument.The tool tracking stage 606 may perform a full-tracking function 616 ora kinematics-tracking (pose prediction) function 618 respectivelyresulting in either a full-tracking state when both visual informationand robot kinematics information are available or a kinematics-trackingstate when visual information is not utilized (e.g., tool outside thefield of view or occluded).

The mode/stage of the tool tracking system changes from tool acquisitionto tool tracking after the tool is initially located within the field ofview. Provided that the tool remains in the field of view, the tooltracking system may remain in the tool tracking mode/stage. However, thetool tracking system may change from a tool tracking mode/stage into atool acquisition mode/stage if the tool is removed from the field ofview and then returns into the field of view. The tool tracking systemmay optionally begin operation with an initialization procedure if thereis only a single tool in the field of view. If additional tools are tobe tracked, the optional initialization procedure may be skipped asother tools have been located and tracked. If the tools have no markers,the optional initialization procedure may involve tools moving around inorder to obtain a robust localization via sequence matching.

The methodology of the tool tracking is now further described withreference to FIG. 6A which starts at block 600 and goes to block 602.

At block 602, an optional initialization of the tool tracking system mayoccur. Mono or stereo video 603 may be used in the tool tracking systemand is initialized to begin generation of digital video frames of imagedata of the surgical site. Kinematics information 605 may also be usedin the tool tracking system during initialization to form an initialpose of the robotic instrument. The kinematics information 605 mayinclude positional information, including angular or linear informationfor example, from sensors located at various places along a robotic armand the robotic instrument. The kinematics information 605 may be forboth the endoscopic camera and robotic instruments such that therelationship between positional information for the robotic instrumentsand the camera may be determined.

Initialization begins with a single tool in the field of view withoutany occlusions. The system may be initialized for additional roboticinstruments in the field of view. If tools have already been located andtracked and a new tool is being added, the new tool can be initializedby placing it into the field of view with the previously located andtracked tools. If no tool has been located and tracked, each tool may beinitialized by placing it within the field of view with all other toolsoutside the field of view. The robotic instrument being initialized maybe placed in the center of the surgical site for optimal estimationacross the whole space or as close to stereo endoscopic camera 101C aspossible that will allow for accurate stereo computation. With amarkerless system, the robotic instrument may be moved and rotated forreliable sequence matching.

At block 604, the tool tracking system enters a tool acquisitionstage/mode in the surgical site. FIG. 9B graphically illustrates thetool acquisition stage in a surgical site. Stereo video images 500L,500Rof the surgical site, such as illustrated in FIG. 5A, are captured bythe endoscopic camera 101C, including one or more robotic instruments101 in the surgical site. Stereo video may be used to obtain an absoluteinitial pose of the one or more robotic instruments 101 in oneembodiment of the invention. In another embodiment of the invention,mono video may be used with kinematics information to estimate absoluteinitial pose (position and orientation) of the one or more roboticinstruments 101. The one or more robotic instruments 101 may includepainted markers 502 to assist in tool acquisition and tool tracking inthe surgical site. The tool acquisition stage performs a locatingfunction 614 resulting in the initial pose of the one or more roboticinstruments 101 in the surgical site.

After the robotic instruments have been acquired, the methodology goesto block 606.

At block 606, the tool tracking system enters a tool tracking mode orstage in the surgical site. The goal of tool tracking is to update theabsolute pose information (location and orientation) based onincremental and/or partial information (visual and robot kinematics). Inthe tool tracking stage 606, the tool tracking system is at afull-tracking state 616 when visual and kinematics information isavailable. If a robotic instrument is not visible (e.g., tools inside anorgan or occluded by other tools) in the surgical site, the tooltracking system is at a kinematics-tracking state 618 for estimatingtool pose.

The tool tracking system may transition from tool tracking 606 andreturn to tool acquisition 604 if a tracked tool gets out of field ofview and then comes back into the field of view of the camera.

Referring now to FIG. 6B, a flow chart of a tool tracking library andits application is illustrated. A tool tracking application 650 isexecuted by a system 351 of the robotic surgical system 100. The videoboard 218 illustrated in FIG. 2 may be a part of the IGS system 351 inorder to receive the video images from the endoscopic camera over thesurgical site. A kinematics application programming interface (API) 660provides a software interface to receive the raw kinematics data fromthe surgical system 100. The kinematics API 660 couples the kinematicsinformation to the tool tracking application 650 and a tool trackinglibrary 652. The raw kinematics data 680 is received by an API streamerthread 658 which provides the physical interface to a communicationchannel (for example, fiber optic cable or Ethernet, and may buffer theraw kinematics data by storing it into a memory, a hard disk, or otherdata storage device). The tool tracking library 652 may issue datarequests to the API streamer thread 658.

A video capture thread 656 is coupled to the endoscopic camera toreceive the raw endoscopic video feed 670. The raw video 670 may be monovideo of a single channel or stereo video of left and right channels.The video capture thread 656 may buffer the raw video data by storing itinto a frame buffer memory, a hard disk, or other data storage device. Avideo application programming interface (API) 659 provides the softwareinterface to receive the raw video data from the surgical system intothe tool tracking system. The tool tracking library 652 may issue datarequests to the video capture thread 656.

The tool tracking library 652 contains the core functionality ofcombining kinematics (through kinematics API 660) and video (throughvideo API 659) for accurate tracking of tools. The library also providesapplication program interface so it can be invoked in a certain way by acustomer-designed tool tracking application 650

In response to the video data and the kinematics data, the tool trackinglibrary 652 generates corrected kinematics data for the pose of arobotic instrument. The raw kinematics data is corrected for orientationand position of the tools. The corrected kinematics data may be used ina number of applications, such as image guided surgery.

As shown in FIG. 6B, the speed of raw kinematics 680 may be 100 to 200Hertz (Hz) and the speed of raw video 670 may be 30 Hz to 60 hz and thespeed of tool tracking maybe even slower. However, the speed of thecorrected kinematics 690 should be substantially similar to the speed ofthe raw kinematics 680 for medical applications. To maintain the speedin the kinematics information, the raw kinematics may be passed through.A correction matrix (rotation and translation) may then be used tocorrect the raw kinematics information from the tool tracking library.Alternatively the corrected kinematics 690 may be directly output fromthe tool tracking library 652 where a correction matrix is applied tothe raw kinematics. Either way is feasible because the correction matrixcorrects the bias in the raw kinematics and the bias changes slowly, forexample, slower than 1 Hz.

Algorithm Architecture to Address Natural & Technical Challenges

Reference is now made to FIGS. 6A, 7, and 8. FIG. 6A, describedpreviously, illustrates a functional block diagram including operationalstages of a tool tracking system. FIG. 7 is a block diagram illustratingthe challenges of performing tool tracking. FIG. 8 graphicallyillustrates a functional block diagram of a tool tracking system 800.

Referring now to FIG. 8, the tool tracking system 800 adaptively fusesvisual information and robot kinematics in order to achieve robust,accurate and efficient tool tracking. The unknown full pose of a roboticinstrument, at a time instant t, is represented as a state s_(t) 805B ina Bayesian state-space model 802. The state-space model 802 may use aplurality of posed states 805A-805C to perform tool tracking in thesurgical site. The state-space model 802 may generate the correctedkinematics information 690 of the robotic instrument. A CAD tool model804 (geometry only or both geometry and texture) is used forsynthesizing (explicitly or implicitly) an image under a given pose(i.e. state).

For updating the state information of a robotic instrument, the relativerobot kinematics {dot over (k)}_(t) 605 (where the dot above the k beingused to represent that the relative or first-derivative measurements ofthe kinematics information) between time instances t to t+1 can becoupled into the state-space model 802. Visual information 603 fromcaptured images may be amplified and analyzed by an amplifier/filter 808to control the influence of visual feedback 809 on the fusion of visualinformation and kinematics information. The amplifier/filter 808generally implements how view geometry statistics are applied foradaptive fusion. If stereo images 803 are available, the spatialconstraints 807 between left and right image pairs may also beexplicitly or implicitly explored to assist in tool tracking.

As illustrated in FIG. 7, there are natural challenges and technicalchallenges to provide tool tracking. The natural challenges are thoseimposed by realistic operational scenarios. The technical challenges arethose caused by proposed tool tracking algorithms when facing naturalchallenges. The natural challenges for example include cluttering andocclusion 702, illumination variation and image appearance change 703,and viewing singularity 704. The technical challenges include imagesegmentation 710 and matching ambiguity 712, for example.

The natural challenge of illumination and image appearance 703 is wherethe same scene changes dramatically along with the motion of directionallight sources. For endoscopic operations, the image intensity of thesame object can be different, depending on the distance of the objectfrom the lighting source and the angle between the lighting source andlocal surface normal. This makes image-based processing less reliable.In addition, specularities from organs, blood that under directionalendo-illumination make image processing more challenging.

The natural challenge of viewing singularity 704 may occur when threedimensional geometry information is derived from two dimensional images.Three dimensional geometry information derived from two dimensionalimages is not reliable when the two dimensional projection of a threedimensional object is degenerated. For example, a three dimensionalcylindrical tool is projected onto an image plane as a circle.

The natural challenge of scene cluttering and occlusion 702 is the casewhere there could be more than one robotic instrument in the field ofview. Additionally, the robotic instruments may be partially or fullysubmerged with complex and dynamic background of organ tissues, bloodand smoke caused by electro-dissection.

As previously mentioned, the technical challenges include imagesegmentation 710 and matching ambiguity 712. Moreover while efficiencyis of concern, a big technical challenge for tool tracking may bereliability and accuracy under realistic situations.

Consider now for example, pure image segmentation 710, i.e.,segmentation of tools from a 2D image only, is a challenging task whenthe background is cluttered and/or objects are occluded. To handle thisparticular technical challenge, prior information is explored as a knownrobotic instrument is being tracked. More specifically, model basedsynthesis techniques 722 may be used. With model based synthesis 722, aCAD model of a robotic instrument may be used to render a clean toolimage as a pattern to match against or search within a limited regionconstrained by the pose information of tool. As a result, pure imagesegmentation from the real images is avoided. Because the states of allrobotic instruments are tracked, mutual occlusions of all these roboticinstruments can be calculated thereby making image matching morereliable.

Another technical challenge in tool tracking, especially markerless tooltracking, is the matching ambiguity of a pair of images 712, eitherbetween left and right images or between real and synthesized images.Fundamentally, many areas in an image look alike and non-correspondingareas of two images may appear to be more alike than two correspondingareas (for example, due to illumination variations), making region-basedmatching ambiguous. To reduce such ambiguity, sequence matching 728 maybe applied where a sequence of images will be matched against anothersequence of images. Such a method is useful when we use robust andaccurate relative kinematics information {dot over (k)}_(t).

For example, consider a sequence of three real images 811A-811C[I_(t−1), I_(t), I_(t+1)] and three corresponding states 805A-805C[s_(t−1), s_(t), s_(t+1)]. For each state, one image can be renderedsuch that a sequence of three synthesized images [I^(S) _(t−1), I^(S)_(t), I^(S) _(t+1)] may be formed. Under a regular analysis-by-synthesisscheme, the real images I_(t) and the synthesized images I^(S) _(t) arecompared. The difference determined by the comparison is used to updatethe corresponding state s_(t). For a three-state sequence, threeindependent computations are used to update three states. Now if we usesequence matching 728 for the same three-state sequence, the situationchanges significantly. For ease of explanation, suppose that the perfector error-less relative kinematics information is a two-sequence [{dotover (k)}_(t−1),{dot over (k)}_(t)] of kinematics information. Thissuggests that there is only one unknown (any one of the three states)for the three-state sequence [s_(t−1), s_(t), s_(t+1)] becauses_(t)=s_(t)=s_(t−1)+{dot over (k)}_(t−1). With one known state ofkinematic information, the respective sequence of three images 811A-811C[I_(t−1), I_(t), I_(t+1)] and the respective sequence of threesynthesized images [I^(S) _(t−1), I^(S) _(y), I^(S) _(t+1)] may be usedto determine the unknown states. That is, if we know any one of thethree states in the three-state sequence [s_(t−1), s_(t), s_(t+1)], wecan obtain other missing states through perfect or error-less relativekinematics.

The sequence of five real images 811A-811C [I_(t−1), I_(t), I_(t+1)] andthe sequence of three synthesized images [I^(S) _(t−1), I^(S) _(t),I^(S) _(t+1)] are then compared to determine a difference to update thecurrent state s_(t) so that its underlying kinematics information ismore accurate and reliable for use in tool acquisition and tooltracking. Thus, sequence matching 728 can provide a more robust and morereliable matching as the number of unknowns is reduced and the samenumber of observations (real images) are kept.

Additionally, appearance learning techniques 724 may be used to handleimage or appearance changes 703 such as from illumination variations andnatural wear of a tool, for example. Generally, appearance learningtechniques handle appearance changes by training the tool trackingsystem on image samples of the same tool under different viewingconditions. Appearance learning techniques have been used extensively inobject tracking to handle appearance change due to illuminationvariations. For example, parametric models have been built to handleillumination variations. Appearance learning techniques are furtherillustrated herein with reference to FIG. 14 with the use of face imagesinstead of tool images.

Moreover, adaptive fusion techniques 726 may be used to handle thechallenges of singularity of viewing geometry or viewing singularity704. The technique of adaptive fusion is used to explore the availablepose information, i.e., predicted state (before correction) when feedinggeometric information derived from video into the Bayesian state-spacemodel 802. More specifically, video-derived information has much lessweight when fused with robot kinematics information under suchconditions. In a Bayesian state-space model 802, this manifests itselfas large noise variance in the observation equation.

Adaptive Fusion of Vision and Kinematics

Adaptive fusion may be used to handle the challenges of singularity ofviewing geometry in order to provide robust and accurate kinematicsinformation of the tools in a surgical, medical, or other type ofrobotic system.

Analysis-by-Synthesis for Tool Localization

Pure image segmentation may be used by a tool tracking algorithm tolocalize tools. Pure image segmentation of tools from a two dimensionalimage is straightforward if the tools have distinctive features, such ascolor marks that may be used to identify a tool. However, operationalconditions may make pure image segmentation techniques difficult if notimpossible to perform.

Referring now to FIG. 9A, an image has a tool 901 hidden by an occlusion902. The occlusion 902 is so severe that it breaks key steps (e.g.,color- and/or shape-based analysis) of pure image segmentation of theimage such that the tool 901 cannot be found therein. The tool shape901S is substantially covered over by the occlusion shape 902S in theimage illustrated in FIG. 9A. In the case of markerless tool trackingwhere the tools have no markings, an occlusion can only make it moredifficult for pure image segmentation techniques to localize a tool.

Referring now to FIG. 9B, the image of the tool 901 is again hidden byan occlusion 902. Techniques of sequence matching and/ormodel-based-synthesis matching may be used to localize the roboticinstruments instead of pure image segmentation. Sequence matching wasbriefly discussed previously. Model based synthesis uses a prioriknowledge regarding kinematics and appearance that may be available forthe tools that are being tracked.

With model based synthesis, a CAD model 904A of the tool 901 is used tosynthesize an image of the tool given the known or hypothesized poseinformation for the tool. The pose information for the tool may bedetermined from kinematics information or otherwise and a posedsynthesized image 904B may then be generated. The posed synthesizedimage 904B of the tool 901 may then be used to perform image matching oran image search within the overall image of a surgical site to find thelocation of the tool 901 therein even though it may be partiallyoccluded. This technique of tool localization may generally be referredto herein as an analysis-by-synthesis approach. Using the synthesized904B image as a pattern to search for the tool 901 within an image ofthe surgical site helps overcome the difficulty of an occlusion 902 thatmay cover the tool 901. Tool image fragments 901′ left over after theocclusion 902 is subtracted from the tool image is sufficient to use todetermine tool localization. However if the occlusion 902 completelycovers over the tool 901, image analysis alone cannot localize tools.

Alternatively in another embodiment of the invention, image segmentationmay be guided by exploring the available prior kinematics and imageinformation. That is, image segmentation may be constrained to beperformed within a limited region of the image of the surgical sitebased on rough pose information of the tool in response to the priorrobot kinematics and the CAD model 904A of the tool 901. This techniqueof tool localization may generally be referred to herein as aided imagesegmentation in contrast to pure image segmentation that has noconstraints.

Image Synthesis and Image Analysis/Search

Referring now to FIG. 9C, image synthesis (also referred to herein asmodel-based synthesis) and image analysis/search are key steps in usinganalysis-by-synthesis methods for tool localization and tool tracking.The image synthesis 911 and image analysis/search 915 processes may berepeatedly performed in an iterative optimization approach to find thebest tool pose parameter in response to a given cost function CF 913.With an iterative optimization approach, an initial pose hypothesis maybe formulated to generate the initial synthesized model tool forcomputation of an initial cost function. The cost function CF 913 is afunction of what corresponding features 912,914 are used for matchingand how an image is synthesized during image synthesis 911. For example,a synthesize edge image 904S of the tool may be synthesized during imagesynthesis 911 in response to the CAD geometry of the CAD model 904A ofthe tool. Alternatively, a synthesized regular image 904B of the toolmay be synthesized during image synthesis 911 in response to the CADgeometry and CAD texture of the CAD model 904A of the tool. Thesynthesize edge image 904S of the tool may be used to perform imagematching with edges in a video image 910A of the tool. The synthesizedregular image 904B of the tool may be used to perform image matchingwith a regular video image 910B of the tool. If the appearance of thetool has changed in the images, e.g., 910B, appearance learning may beused to augment the analysis-by-synthesis process for tool localizationand tool tracking. Note that an edge image, such as illustrate in videoimage 910A, is typically robust against lighting variations.

With a synthesized tool image I^(s) being synthesized in response to agiven tool pose in the camera coordinate system and the CAD model M(with geometry M_(g) and texture M_(t)); the synthesis process may beexpressed by an equation as

I ^(s) [x]=L(M _(t)[Φ;(P,Ω,M _(g))])  (Eq. 1)

where x=[x, y] is the image coordinate and Φ is the homogeneous camerageometric projection from three dimensions (3D) into two dimensions(2D). Thus, the model texture M_(t) can be mapped to the coordinate ofimage I^(s)[x] as a function of the homogeneous camera geometricprojection Φ and a combination of tool pose (position P and orientationΩ of the tool), and the geometry M_(g) of the tool model. Forpresentation clarity, we omit the nonlinear mapping step from the 2Dhomogeneous coordinates [x_(w), y_(w), w] after Φ projection to 2Din-homogeneous image coordinates

ψ:[x,y]=[x _(w) /w,y _(w) /w].  (Eq. 1A)

In an example image synthesis pipeline, the model will be decomposedinto triangles, the 3D vertex coordinates of which will be described ina coordinate system attached to the model. The model coordinates willfirst be transformed to a world coordinate system, before beingprojected to a 2D display coordinate system by applying the cameramodel. Once in the 2D display coordinate system, each triangle will berasterized. The synthesis of the final per-pixel color values may becomputed via interpolation of color specified on a per-vertex basis,texture mapping and filtering, and the application of lighting models.(Reference: Computer Graphics: Principals and Practice, by James D.Foley, Andries van Dam, et. al, Addison-Wesley Professional; 2 edition,Aug. 4, 1995, ISBN: 978-0201848403).

The function L is a mapping function that maps the model texture M_(t)into the real image/appearance I^(s)[x] because the real image varieswith lighting conditions and other factors, such as occlusions.

The tool pose may be represented by the positionP=[P_(X),P_(Y),P_(Z)]^(T) of a chosen reference point 931R (e.g., thecontrol point before the tool wrist) and the orientation Ω of its localcoordinate system 920 originated in the reference point 931R withrespect to the camera coordinate system 921. Camera coordinates of a 3Dpoint 931P on the tool that maps to x may be represented by [X,Y,Z]^(T).A local coordinate of the 3D point 931P on the tool that is internal tothe tool model may be represented as [X_(M),Y_(M),Z_(M)]^(T). Atransformation T_({P,Ω}) of the local coordinate of the 3D point 931P onthe tool to the camera coordinate of the tool as a function of the toolpose may be written as

[X,Y,Z,1]^(T) =T _({P,Ω}) [X _(M) ,Y _(M) ,Z _(M),1]^(T)  (Eq. 2)

where T_({P,Ω}) is a four by four 3D-to-3D rigid transformation matrixthat can be further decomposed into translational and rotational parts.

After image synthesis of the synthesized tool image I^(s), an imageanalysis or image search is performed to find the best estimate of thetool pose. Mathematically this is an optimization problem that may bewritten in equation form as

$\begin{matrix}{{T_{\{{P,\Omega}\}}^{*} = {\arg \mspace{11mu} {\min\limits_{T_{\{{P,\Omega}\}}}\mspace{11mu} {C\left( {I^{s},I} \right)}}}},} & \left( {{Eq}.\mspace{14mu} 3} \right)\end{matrix}$

where C is a cost function of comparing images. The synthesized toolimage I^(s) may be repeatedly generated in an iterative manner usingupdated pose information so that the cost function C of comparing thesynthesized tool image I^(s) with the video images of the surgical siteare minimized and the tool pose is optimized.

One of the simplest cost functions is a sum of squared differences (SSD)that may be used to compare the synthesized tool image I^(s) with thevideo images of the surgical site. However even though an SSD is asimple cost function, it is nonlinear (e.g., higher than quadratic) interms of the pose parameters due to the camera geometric projection Φthat is nonlinear and the mapping function L to map model texture toreal image/appearance that varies with lighting conditions that may benon-linear. Minimizing a nonlinear cost function C is a complexoptimization problem.

Different strategies can be applied to solve a problem of minimizing anonlinear cost function C. For example, random optimization methods maybe used to solve and minimize a nonlinear cost function C problem inorder to avoid an exhaustive parameter search. On the other hand, aquadratic approximation of the cost function may be use to iterativelysolve the nonlinear problem.

In one embodiment of the invention, the complex optimization problem maybe broken up into two different steps to more efficiently minimize thecost function C. The first step entails performing an image matchingwhere the raw image pixels or extracted image features of I are used formatching against those of the respective synthesized tool image I^(s).The second step involves performing a geometry-only optimization inresponse to the result of the image matching between the raw imagepixels or extracted image features and corresponding ones from therespective synthesized tool image I^(s). Mathematically, these two stepsto solve the optimization problem of Eq. 3 may be formulated into thefollowing two equations:

$\begin{matrix}{{\left( \left\{ {x^{m},X^{m}} \right\} \right) = {\arg \mspace{11mu} {\min\limits_{({x,X})}{C\left( {I^{s},I} \right)}}}},{{for}\mspace{14mu} a\mspace{14mu} {given}\mspace{14mu} T_{\{{P,\Omega}\}}}} & \left( {{Eq}.\mspace{14mu} 4} \right)\end{matrix}$

and the best T_({P,Ω}) is determined as

$\begin{matrix}{T_{\{{P,\Omega}\}}^{*} = {\arg \mspace{11mu} {\min\limits_{T_{\{{P,\Omega}\}}}\mspace{11mu} {\sum\; {\left( {x^{m} - {f\left( X^{m} \right)}} \right)^{2}.}}}}} & \left( {{Eq}.\mspace{14mu} 5} \right)\end{matrix}$

Eq. 4 represent the step of finding the corresponding 2D feature pointsx^(m) from 1 and 3D points X^(m) on the tool via image matching of I^(s)and I. Eq. 5 represents the geometry-only optimization where optimal3D-2D mapping T_({P,Ω}) can be found given the matched 2D-3D pairs. Thefunction f( ) is a nonlinear function in the following formψ(ΦT_({P,Ω})).

In cases where the initial pose hypothesis is not close to the true poseor in case where it is desirable to obtain a very accurate poseestimate, the foregoing steps to solve the optimization problem (Eqs. 4and 5) combined with the synthesis step (Eq. 1) can be repeated in aniterative procedure.

The image matching and analysis-by-synthesis processes may beincorporated into a sequential framework for fusion of vision andkinematics information to obtain more accurate positional information ofa tool than would otherwise be available from each alone.

Appearance Learning

Referring now to FIG. 14, a diagram illustrating appearance learning ofobjects within an image is illustrated. As discussed previously,appearance learning techniques may be used to handle image andappearance changes, such as from illumination variations and naturalwear of a tool, for example. The appearance variations due to changes inillumination may exhibit illumination subspace/cone phenomena orspherical harmonics for example. Appearance learning techniquesgenerally train the tool tracking system on image samples of the sametool under different viewing conditions. Pose specific learningtechniques may be used as well as clustering or manifold learningtechniques may be used to train the tool tracking system over a largenumber of samples.

In FIG. 14, basis images for illumination variations 1401A, 1401B, 1401Cmay be used to train the tool tracking system to generate one or moresynthesized images 1402A-1402B which are more closely matched to therespective real images 1045A-1405B that may be captured under differentlighting conditions.

Appearance learning techniques have been used extensively in objecttracking to handle appearance change due to illumination variations(reference: G. Hager and P. Belhumeur, “Efficient Region Tracking withParametric Models of Geometry and Illumination,” IEEE Trans. PatternAnalysis and Machine Intelligence, Vol. 20, pp. 1025-1039, 1998). Forexample, parametric models have been built to handle illuminationvariations (reference: H. Murase, S. Nayar, “Learning and Recognition of3-D Objects from Brightness Images,” Proc. AAAI Fall Symposium, MachineLearning in Computer Vision, pp. 25-29, 1993.

Sequential Adaptive Fusion of Vision and Kinematics

After image matching, such as through analysis-by-synthesis for example,the next step to obtain more accurate positional information is thefusion of image positional information and kinematics positionalinformation of the robotic instruments. In general, the purpose ofinformation fusion is to provide more robust and/or more accuratepositional information for the robotic instruments in the surgical sitesuch that tool tracking information may be applied in various ways toobtain accurate results, e.g., measurements of certain physical entitieswithin a surgical site. Key to successfully fusing information togetherfrom similar sources or from different sources is determining how toadjust the contributions of each to the fusion. The contribution ofsources to the information fusion may be adjusted in different ways,such as by a winner-take-all or a weighted averaging method, forexample.

Ideally, all sources of information should be fused together so that theinformation fusion constantly provides the best accuracy and the mostrobust tool tracking. However due to the dynamic nature of systems,there typically is a trade-off between accuracy and robustness ininformation fusion. As a result of the tradeoffs, the typical practicalapproaches to information fusion tend to have compromised results.

State-Space Model for Incorporating Vision and Kinematics

Referring now to FIGS. 10A-10B, a state-space model is now described toadaptively fuse together robot kinematics information and vision-basedinformation. Both raw robotic kinematics information 1010 of the roboticinstrument and vision-based information 1011 can be used to generate thestate variables 1000A-1000D. A tool model 804 may be used to synthesize611 the synthesized images 810 in response to the state variables1000A-1000D. An image analysis 806, 609 is performed comparing thesynthesized images 810 of the robotic instrument with the observedimages 1011.

Some real-world data analysis tasks involve estimating unknownquantities from given observations. Moreover, a priori knowledge aboutphenomenon of a number of applications may be available to allow us toformulate Bayesian models involving probability and statistics.

The unknown quantities of information fusion at corresponding timeinstances can be defined as state variables 1000A-1000D (of the system)and a Markov chain model can be assumed among states at different timeinstances, then a state-space model may be formed. The state-space modelmay include 1) a dynamic/state model that relates state variables1000A-1000D at different time instances (t−1, t, t+1, t+2) and 2) anobservation model that relates state variables S 1000A-1000D toobservations O 1002A 1002D. In the case of Gaussian noise, thestate-space model (a discrete version is shown—a continuous version issimilar and involves temporal integration) may be described by thefollowing set of mathematical equations,

$\begin{matrix}\left\{ \begin{matrix}{{Initial}\mspace{14mu} {estimate}} & s_{0} \\{{Dynamic}\mspace{14mu} {model}} & {s_{t} = {{Ds}_{t - 1} + v_{t}}} \\{{Observation}\mspace{14mu} {model}} & {o_{t} = {{Hs}_{t} + w_{t}}}\end{matrix} \right. & \left( {{{Eq}.\mspace{14mu} 5}A} \right)\end{matrix}$

where D and H are the dynamic matrix and observation matrixrespectively. v_(t) and w_(t) are respectively the dynamic noise and theobservation noise that have Gaussian distributions N(μ_(s),C_(d)) andN(μ_(o),C_(o)) respectively. C_(d) and C_(o) are covariance matrices fordynamic model and observation model respectively.

State-space models haves been used in many disciplines of science andengineering, and may be referred to by different names such as Bayesianfiltering, optimal filtering, stochastic filtering, on-line inference,for example.

If the state-space model is linear and the modeling noises of the systemare Gaussian, then a substantially exact analytic expression can bederived to solve the on-line estimation problem for the state variablesof information fusion. In this case, the analytic expression that iswell-known and widely used is a Kalman filter (see R. E. Kalman, “A NewApproach to Linear Filtering and Prediction Problems,” Trans of theASM—Journal of Basic engineering, Vol. 82, pp. 35-45, 1960.). In thecase of a non-linear system and Gaussian noise, Extended KalmanFiltering (EKF) (reference: Greg Welch and Gary Bishop, An Introductionto Kalman Filter, Dept. Computer Science Tech Report 95-041, Universityof North Carolina, updated 2006.) can be applied where the non-linearsystem is approximated by linearizing the models (either non-lineardynamic model or non-linear observation model or non-linear both) basedon previous estimate and applying Kalman filtering.

However, for more complex problems that do not obey linear model andGaussian noise assumption, a set of more general methods are required.The general method for estimating the state variables is often referredto as Sequential Monte Carlo (SMC) methods (reference: J. Liu and J. R.Chen, “Sequential Monte Carlo Methods for Dynamic Systems,” Journal ofAmerican Statistical Association, Vol. 93, pp. 1032-1044, 1998, In SMCmethods (also referred to as particle filter), states are represented byposterior probability density function (pdf) and sampling techniques areused to generate the posterior probability density function (pdf).

In a number of embodiments of the invention, a state-space model basedapproach is used to adaptively fuse robot kinematics and visioninformation. In the following discussion, s represents a state vector ofthe states 1000A-1000D. In practice, s may be a vector version of thematrix T_({P,Ω}) (Eq. 1), or a collection of point positions[X,Y,Z]^(T), etc., that are equivalent in theory but may be chosen basedon a particular application.

Respective velocity information may be readily added to the state spacemodel by taking the first derivative of the positional information.

Without any loss of generality, consider for example position P andorientation Ω and first derivatives {dot over (P)} and {dot over (Ω)} ofa robotic instrument. Position and velocity may be representedmathematically as

$\begin{matrix}\left\{ \begin{matrix}{Position} & {P = \left\lbrack {P_{X},P_{Y},P_{Z}} \right\rbrack^{T}} \\{{Linear}\mspace{14mu} {Velocity}} & {\overset{.}{P} = \left\lbrack {{\overset{.}{P}}_{X},{\overset{.}{P}}_{Y},{\overset{.}{P}}_{Z}} \right\rbrack^{T}}\end{matrix} \right. & \left( {{Eq}.\mspace{14mu} 6} \right)\end{matrix}$

The orientation and angular velocity of the robotic instrument may berepresented mathematically using unit quaternion for its minimal-stylerepresentation and operation efficiency as

$\begin{matrix}\left\{ \begin{matrix}{Orientation} & {\Omega = \left\lbrack {\theta_{0},\theta_{x},\theta_{y},\theta_{z}} \right\rbrack^{T}} \\{{Angular}\mspace{14mu} {Velocity}} & {\left\lbrack {0,{\overset{.}{\Omega}}^{T}} \right\rbrack^{T} = \left\lbrack {0,\omega_{x},\omega_{y},\omega_{z}} \right\rbrack^{T}}\end{matrix} \right. & \left( {{Eq}.\mspace{14mu} 7} \right)\end{matrix}$

Combining the position and orientation vectors together, the statevector may be represented mathematically in a covariance matrix asfollows:

$\begin{matrix}{s_{t} = \begin{bmatrix}P_{t} \\\Omega_{t} \\{\overset{.}{P}}_{t} \\{\overset{.}{\Omega}}_{t}\end{bmatrix}} & \left( {{Eq}.\mspace{14mu} 8} \right)\end{matrix}$

The filter state-space model may be implemented with extended Kalmanfiltering (simple to implement but maybe not sufficient), unscentedKalman filtering (see S. J. Julier and J. K. Uhlmann. A New Extension ofthe Kalman Filter to Nonlinear Systems. In Proc. of AeroSense: The 11thInt. Symp. on Aerospace/Defense Sensing, Simulation and Controls., 1997(easy to implement and maybe sufficient)), or computationally expensiveparticle filtering.

Extended Kalman filtering or particle filtering may be used for tooltracking because 1) the dynamic state space model is non-linear due tothe quaternion representation, and 2) the observation model isnon-linear in the case of using 2D images as observations and linear inthe case of using stereo-derived 3D points as observations. In oneembodiment of the invention, we adopt the following nonlinear equationto model the transfer of the system from state

$\begin{matrix}{{s_{t - 1}\mspace{14mu} {to}\mspace{14mu} s_{t}}{{d\left( s_{t - 1} \right)} = \begin{bmatrix}{P_{t - 1} + {\overset{.}{P}}_{t - 1}} \\{q\left( {\Omega_{t - 1},{\overset{.}{\Omega}}_{t - 1}} \right)} \\{\overset{.}{P}}_{t - 1} \\{\overset{.}{\Omega}}_{t - 1}\end{bmatrix}}} & \left( {{Eq}.\mspace{14mu} 9} \right)\end{matrix}$

where the non-linear part comes from quaternion operationΩ_(t)=q(Ω_(t−1),{dot over (Ω)}_(t−1)). The Jacobian matrix for Eq. 9 isas follows:

$\begin{matrix}{D = \begin{bmatrix}I_{3 \times 3} & 0_{3 \times 4} & I_{3 \times 3} & 0_{3 \times 3} \\0_{4 \times 3} & Q_{4 \times 4} & 0_{4 \times 3} & 0_{4 \times 3} \\0_{3 \times 3} & 0_{3 \times 3} & I_{3 \times 3} & 0_{3 \times 3} \\0_{3 \times 3} & 0_{3 \times 3} & 0_{3 \times 3} & I_{3 \times 3}\end{bmatrix}} & \left( {{Eq}.\mspace{14mu} 10} \right)\end{matrix}$

where Q is a skew-symmetric matrix given by

$\begin{matrix}{Q = {{\frac{1}{2}\begin{bmatrix}0 & {- \omega_{x}} & \omega_{y} & \omega_{z} \\\omega_{x} & 0 & {- \omega_{z}} & \omega_{y} \\{- \omega_{y}} & \omega_{z} & 0 & {- \omega_{z}} \\{- \omega_{z}} & {- \omega_{y}} & \omega_{x} & 0\end{bmatrix}}.}} & \left( {{Eq}.\mspace{14mu} 11} \right)\end{matrix}$

Observations of Robot Kinematics and Vision

Sets of observations are available to the state space model includingrobot kinematics (k_(t) ^(P) and k_(t) ^(Ω)) 1010 and vision 1011. Inone embodiments of the invention, image measurements, the 2D locationsof featured points x_(i), are directly used to construct the observationvector O. Such a choice make the observation equation for the visionpart nonlinear due to the perspective projection of a 3D tool onto one2D image in the case of monocular view, or left and right images in thecase of stereo view.

Alternatively, vision-derived (through matching stereo image, forexample) 3D points X_(s,i), observations due to vision, may be used toconstruct the observation vector O. In such case, the observation matrixC_(o) consists of two parts: one for kinematics (the leading diagonalsub-matrix of Eq. 12) and one sub-matrix, a covariance matrix C_(o,v),for vision as follows:

$\begin{matrix}{C_{o} = \begin{bmatrix}{\sigma_{k^{P}}^{2}I_{3 \times 3}} & \; & \; & \; & \; \\\; & {\sigma_{k^{\Omega}}^{2}I_{3 \times 3}} & \; & \; & \; \\\; & \; & {\sigma_{{\overset{.}{k}}^{P}}^{2}I_{3 \times 3}} & \; & \; \\\; & \; & \; & {\sigma_{{\overset{.}{k}}^{\Omega}}^{2}I_{3 \times 3}} & \; \\\; & \; & \; & \; & C_{o,v}\end{bmatrix}} & \left( {{Eq}.\mspace{14mu} 12} \right)\end{matrix}$

In such a case, the observation equation is linear for the vision partand we can construct the observation covariance matrix C_(u,v). Forexample, we have the following covariance matrix for the case ofparallel camera setup (FIG. 12B):

$\begin{matrix}{C_{o,v} = \begin{bmatrix}C_{o,v,1} & \; & \; & \; \\\; & C_{o,v,2} & \; & \; \\\; & \; & \ldots & \; \\\; & \; & \; & C_{o,v,n}\end{bmatrix}} & \left( {{Eq}.\mspace{14mu} 13} \right)\end{matrix}$

where the view-geometry variance matrices C_(o,v,i) (for each 3D pointX_(s,i)) are related to 1) the uncertainty (standard deviation) ofmatching stereo images, 2) the inverse of image resolution (for example,high-definition camera offers better accuracy than standard-definitioncamera), and 3) the square of the true values of X_(s,i).

In a number of embodiments of the invention, first-order kinematicinformation (velocity information) ({dot over (k)}_(t) ^(P) and {dotover (k)}_(t) ^(Ω)) is provided that may provide an extra constraintwith respect to the evolution (e.g., changes from state to state) of thedynamic system. In other embodiments of the invention, first-ordervision observations may be derived by tracking image points (i.e.,temporal image matching) to provide an extra constraint.

In embodiments of the invention with analysis-by-synthesis methodology,estimated states (e.g., tool poses) may be used to synthesize images forimage matching against real images to establish correspondence between3D points on the tool model and 2D points on the tool image. With visualcorrespondence, observation vectors may be formed for the vision-basedinformation and the observation equation is non-linear due to theperspective projection of 3D points to 2D points.

Bundle Adjustment for Improved Results

In the case of Gaussian noises and linear systems, Kalman filteringprovides a perfect sequential solution to a batch least squareoptimization problem. because of the Markovian assumption that given thepresent state future states are conditionally independent of the paststates. However, tool tracking in reality is a nonlinear problem and itis often not easy to achieve an accurate solution with a sequentialapproach such as extended Kalman filtering. To achieve accurate resultswith extended Kalman filtering given a non-linear system, an iterativebundle adjustment process that combines all observations across multiplestates may be used starting from initial results provided by theextended Kalman filtering 802. As a general optimization method, bundleadjustment has wide applications.

For example, bundle adjustment can be used for sequential matching to bediscussed later. To given a specific example in the case of stereo imagematching, we have spatial image matching across two views and temporalimage matching within each image sequence. By combining all theseredundant information together through bundle adjustment, we can achieveimproved results compared single stereo pair based results.

Bundle adjustment, an optimization technique in photogrammetry, refersto the “bundles” of light rays leaving each 3D feature and converging oneach camera center which are “adjusted” optimally with respect to bothfeature and camera positions (reference: C. Salma, C. Theurer, and S.Henrikson, Manual of Photogrammetry, American Society of Photogrammetry,Falls Church, 1980). A bundle adjustment problem is essentially just alarge sparse geometric parameter problem that is nonlinear in nature. Toimplement bundle adjustment iterative procedures may be used, such asthe well known Levenberg-Marqudart mathematical algorithm (see“Numerical Recipes in C: The Art of Scientific Computing”, William H.Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery,Second edition, Cambridge University Press, 1992).

The following describes one example of applying bundle adjustment toimprove the pose estimate for the problem of tool tracking.

In tool tracking, estimates of the poses of a robotic instrument areneeded for the state space model. Bundle adjustment is used to optimizethe estimated poses of the robotic instrument in a chosen coordinatesystem, e.g., the camera-centered coordinate system. That is, it isdesirous to obtain the relative orientation of the tool with respect tothe camera.

There are a number of ways to use bundle adjustment with the state-spacemodel. There is a batch approach, a window-based approach, and arecursive/sequential approach to apply bundle adjustment techniques tothe state space model. With a recursive/sequential approach, bundleadjustment is applied whenever there is a new observation, e.g., a newpair of stereo images (see P. McLauchlan, “The Variable State DimensionFilter applied to Surface-Based Structure from Motion,” CVSSP TechReport TR-4/99, University of Survey, 1999). Typically, someapproximation is made to make the computations efficient. With awindow-based approach, bundle adjustment is applied to a short sequenceof states and measurements. With a batch approach, bundle adjustment isapplied to all available states and measurements. Generally, a batchprocess approach may be best, followed by a window-based approach, andfinally a recursive/sequential approach in implementing bundleadjustment with the state-space model.

For example, a batch bundle adjustment may be applied at each timeinstance or at selected time instances based on all the measurements andstate estimates that are available from extended Kalman filtering.Applying a batch bundle adjustment in the beginning of time where thestate-space model is applied may be preferred because 1) quickconvergence to the correct solution to a non-linear optimization problemis desirous from the beginning, and 2) the computation is efficientbecause there are only a small number of states and observationsavailable.

Image Matching

Image matching is used for incorporating vision information into thetool tracking system and its state space model. Image matching (Eq. 4)in the image analysis steps (Eqs. 4 and 5) finds the corresponding 2Dfeature image points and 3D points on the tool. Image matching alsofinds the corresponding image features between the left and right imagesfrom a stereo camera.

Image matching may be facilitated by sequence matching of a temporalsequence of frames of video images. Alternatively, image matching may befacilitated by using a 2D or 3D model of a tool and a video image.Implementation of the two dimensional image matching may alternately beperformed by simple intensity-based SSD (sum of squared difference),feature-based matching (for example, a point, ascale-invariant-feature-transform feature: D. Lowe, “Object recognitionfrom local scale-invariant features,” Proc. Int. Conf. Computer Vision,1999.), or probabilistic matching. When matching a 2D image with a 3Dmodel synthesized image, more robust features such as edges can be used.In this case, the cost function would be the sum of distance measuresfrom the image edge points to the closest synthesized curves/lines(reference: D. Lowe, “Fitting parameterized three-dimensional models toimages,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.13, pp. 441-450.).

Image matching can be applied in different scenarios, such as temporalmatching of images within sequences of images (FIG. 11A) or spatialmatching of images across two stereo views (FIG. 11C). In anotherembodiment of the invention, real images are used to match againstsynthesized images in the analysis-by-synthesis approach (FIG. 11B). Inanother embodiment of the invention, two or more of these image matchingtechniques (FIGS. 11A-11C) may be used together to perform imagematching. In each of these embodiments of the invention, image matchingcould be applied to corresponding artificial markers attached toinstruments, natural image features or image appearances (e.g.,instrument tips). The artificial markers are passive visual markers.

Referring now to FIG. 11A, temporal image matching of a pair of videoimages 1101V-1101V′ is illustrated. The video image 1101V of the actualtool 101 (e.g., see FIG. 1A) is taken at time t resulting in a toolimage 101V at time t. The video image 1101V′ of the same actual tool 101is taken at a different time, time t+1, by the same camera resulting ina tool image 101V′ at time t+1. During the time the images are captured,the camera may be fixed with respect to the robotic surgical systemwhile the tool 101 may move relative to the robotic surgical system.

Various aspects of the video images of the tool taken at different timesmay be used to perform image matching. That is one or more of a matchingof markers 1110, a matching of natural features 1111, and/or anappearance matching 1112 may be performed. For example, markers 502V onthe tool image 101V in the first video image 1101V of the tool 101 maybe compared with the markers 502V′ on the tool image 101V′ in the secondvideo image 1101V′ to help determine new pose information of the actualtool 101. Besides marker image matching, other information may be useddetermine pose information of the actual tool 101.

Referring now to FIG. 11B, synthesis image matching of a video image1101V and a synthesized image 1101S is illustrated. The video image1101V of the actual tool 101 (e.g., see FIG. 1A) is taken by a cameraresulting in a tool image 101V. The synthesized image 1101S of the tool101 is generated by a computer having prior knowledge of the actual tool101 resulting in a synthesized tool image 101S. The pose of synthesizedtool image 101S in the synthesized image 1101S of the tool 101 attemptsto match the pose of the tool represented by the tool image 101V in thevideo image 1101V at a moment in time.

Various aspects of the video image 1101V and the synthesized image 1101Sof the tool may be used to perform image matching. That is, one or moreof a matching of markers 1110, a matching of natural features 1111,and/or an appearance matching 1112 may be performed. For example,markers 502V in the first video image 1101V of the tool image 101V maybe compared with the synthesized markers 502S on the synthesized toolimage 101S in the synthesized image 1101S of the same tool 101 to helpdetermine new pose information of the actual tool 101.

Referring now to FIG. 11C, special image matching of a left video image1101VL and a right video image 1101VR is illustrated. The left videoimage 1101VL of the actual tool 101 (e.g., see FIG. 1A) is taken with acamera in a first position with respect to the tool, such as a leftside, resulting in a left tool image 101VL. The right video image 1101VRof the same actual tool 101 is taken at a different position withrespect to the tool, such as the right side, resulting in a right toolimage 101VR. The left video image 1101VL and the right video image1101VR are captured at substantially the same time by their respectivecameras.

Various aspects of the left and right video images of the tool may beused to perform image matching. That is, one or more of a matching ofmarkers 1110, a matching of natural features 1111, and/or an appearancematching 1112 may be performed. For example, markers 502VL on the lefttool image 101VL in the left video image 1101VL of the tool 101 may becompared with the markers 502VR on the right tool image 101VR in theright video image 1101VR of the same tool to determine new poseinformation of the actual tool 101.

As mentioned herein, these image matching techniques may be combined togenerate better pose information for the tool. For example, it isnatural to combine temporal image matching (FIG. 11A) with spatial imagematching (FIG. 11C) or combine temporal image matching (FIG. 11A) withsynthesis image matching (FIG. 11D). However, all three techniques maybe used all together or flexibly in various combinations to try andobtain the best pose information for the tool.

In another embodiment of the invention, stereo images are used toconstruct 3D feature points, and these 3D points can be matched (forexample, through the popular iterative closest point algorithm,reference: P. Besel and N, McKay, “A method for registration of 3-Dshape,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 14,pp. 239-256, 1992) against corresponding 3D points, for example,markers, on the robotic instrument. Before we have stereo-derived 3Dpoints, we need to apply a two-step approach: 1) matching of imagecorrespondences, and 2) 3D reconstruction based on the geometry ofstereo cameras (FIG. 12B).

Previously, image matching has been described with respect to explicitimage synthesis, such as a full image synthesis or a geometry-only imagesynthesis. However, image matching may be made using implicit imagesynthesis. That is, images are not actually synthesized. Rather, acomputer aided design (CAD) model of the tools and prior poseinformation of the tools are implicitly used to facilitate imagematching.

Moreover, artificial markers may be applied to the tools to assist inimage matching as is described below. However, natural markers orfeatures of a tool may be used to assist in image matching. Thus, thefollowing descriptions apply equally well to the case when no artificialmarkers are present and features of the tool, i.e., natural markers, aredetected directly from the instrument images.

Referring now to FIG. 5B, image matching can be significantly simplifiedyet made robust by exploring prior knowledge and available robotickinematics even in the presence of more than one robotic instrument inthe field of view. FIG. 5B illustrates video images 101AV and 101BV fora respective pair of tools 101A and 101B in the field of view 510, FIG.5B further illustrates pose information 101AK and 101BK based onkinematics for the respective tools 101A and 101B in and around thefield of view 510. The video images 101AV and 101BV and the poseinformation 101AK and 101BK for the respective tools 101A and 101B maybe adaptively fused together to improve the overall pose information foreach. A plurality of marker dots 502A and 502B or other types ofmarkers, may be affixed to the respective tools 101A and 101B. Videoinformation of the marker dots 502A′ and 502B′ may be ascertained fromthe video images 101AV and 101BV of the respective tools 101A and 101B.

For simplicity in explanation, it is assumed that image feature points(maker dots 502A′-502B′) are reliably localized, and without loss ofgenerality, it is further assumed that the image feature points form apattern. Using the concept of a pattern, the image matching problem issimplified from many dot-to-dot matchings to a single pattern matching.A pattern matching or association is straightforward if one tool is inthe field of view. If more that one robotic instrument tool is in thefield of view, there are two approaches available to solving the imagematching problem that can be used alone or combined together.

In one embodiment of the invention, it is assumed that the robotickinematics information will not change the spatial arrangements(positional and/or orientational) across instruments, especially in thebeginning of surgical operations. For example, if two instruments arearranged to the left and to the right in the camera coordinate system,then the robotic kinematics should represent that arrangement. However,this is not a requirement on the absolute values of the robotickinematics information. For example, the robotic kinematics informationmay indicate that the one or both of the robotic surgical tools areoutside the field of view. Resolving pattern association ambiguity inmatching a first pattern in an image to the tool arranged to the leftand a second pattern in the image to the tool arranged to the right canbe carried out in either 2D image or 3D space.

In another embodiment of the invention, tool motion is used to resolvethe pattern association ambiguity. As the plurality of tools movedifferently (e.g., different directions and/or speed), pattern imagesare tracked individually through temporal image matching. The motiontrajectories of the pattern images are then compared against thekinematic information of each of the tools. From the comparison, thecorrect association of pattern images to tools can then be made.

There may be a pattern association issue for a single tool in the fieldof view. A pattern may be directionally ambiguous, such as a linepattern with identical markers that has a directional ambiguity of a180-degree flip. Pattern association for a single tool is not an issueif the pattern consisting of artificial markers or natural markers isunique. For example, if the pattern of markers has directionalinformation embedded such that the markers at each of ends of thepattern are distinctive there is no issue. In one embodiment of theinvention, the design of the artificial markers on the tool providesdirectional information of the marker pattern.

If there is a directional ambiguity for a given tool, there are twoapproaches to solving the image matching problem, similar to how thepattern associations for multiple tools is handled. The first approachto solving the image matching problem is to use very rough robotickinematics information to resolve any directional ambiguity. The robotickinematics should not flip although it may be far away, such as outsidethe field of view after projection onto 2D image. The second approach tosolving the image matching problem is to use motion trajectories of thepattern to remove the directional ambiguity.

View-Geometry Statistics for Adaptive Fusion

Image information may have quality issues (matching reliability andaccuracy) regarding image-derived 3D information. For example, aviewing-singularity happens when a cylindrical instrument with markerson the shaft is projected to an image of small circle (see 1222S in FIG.12A). That is, all the markers become invisible, hence there is novision-based observation. Another extreme case for perfect viewing iswhen the instrument shaft lies right in the field of view so that allthe markers are fully visible. That is, a circular marker is projectedonto image as a circle for example. In practice, we often have scenariosbetween these extreme cases.

Viewing geometry is more than just pure geometry between a camera and a3D object. Other information related to viewing geometry can impact thequality of image-derived information for fusion.

Referring now to FIG. 12A, the geometry of an illumination source1210A-1210B with respect to the camera 1211A-1211B and the 3D object maybe of interest to achieve accurate and reliable information. Theposition of light source 1210A with respect to the camera 1211A and the3D object 1200 generating view 1 may differ from the position of lightsource 1210B with respect to the camera 1211B and the 3D object 1200generating view 2. Moreover, the different poses of the 3D object maychange how a light source strikes its features and provide differentviews.

Additionally, different parts of a 3D object may behave differentlyunder the same or different viewing conditions. For example, the tip1200T and the shaft 1200S of a 3D tool 1200 may generate differentimage-derived 3D information with different reliabilities. In image1221A the shaft image 1221S may be good to use while in image 1221B theshaft image 1222S may be poor to use in forming image-derived 3Dinformation.

The quality of image-derived 3D information also depends upon theparticular image features used. For example, edge-based features areless sensitive to illumination change than intensity-based features.

These issues with regard to viewing geometry contribute toward howvision information should be generated and combined with robotkinematics for fused results that are reliable and accurate. The qualityof viewing geometry may be described statistically. View-geometrystatistics are used to represent how good the image-derived informationfor fusion is when compared to the ground-truth. That is, view-geometrystatistics may be used to estimate uncertainty. The view-geometrystatistics are used in the Bayesian state-space model for adaptivelyfusing image information and kinematics information. To be specific, theview geometry statistics may be represented in the covariance matrix ofthe observation equation (Eq. 13).

The following may be considered for view-geometry statistics:Digitization error/image resolution; feature/algorithm related Imagematching error; Distance from object to camera; Angle between objectsurface normal and line of sight; Illumination and specularity. Based oncertain noise assumptions (e.g., independent Gaussian noise),view-geometry statistics for these phenomenon may be computed. To make aspecific example, we assume the case of parallel stereo images (FIG.12B) were 3D points are first reconstructed and then fed into the statespace. Under this ideal assumption, we have y^(r)=y^(l). And thex-directional image projection are

${x^{r} = {{f_{x}\frac{X^{s} - {1\text{/}2B_{s}}}{Z_{s}}\mspace{14mu} {and}\mspace{14mu} x^{l}} = {f_{x}\frac{X^{s} + {1\text{/}2B_{s}}}{Z_{s}}}}},$

where B_(s) is the baseline distance between the two optical centers andf_(x) is the common focal length in x-direction. Finally, the 3Dreconstruction problem becomes a simple one with a parallel setup asfollows:

$\begin{matrix}\left\{ \begin{matrix}{X_{s} = {\frac{1}{2}{B_{s}\left( \frac{x^{l} + x^{r}}{x^{l} - x^{r}} \right)}}} \\{Y_{s} = {\frac{1}{2}{B_{s}\left( \frac{f_{y}}{f_{x}} \right)}\left( \frac{y^{l} + y^{r}}{x^{l} - x^{r}} \right)}} \\{Z_{s} = {B_{s}\left( \frac{f_{x}}{x^{l} - x^{r}} \right)}}\end{matrix} \right. & \left( {{Eq}.\mspace{14mu} 14} \right)\end{matrix}$

For presentation simplicity, we use d_(x)=x^(l)−x^(r) to represent theimage disparity for the matched left and right image point.

In the following, we list equations and plots based on the assumption ofindependent Gaussian noise and the parallel stereo setup (Eq. 14),Assuming that we have image matching uncertainty σ_(x), we can derivethe following equation

$\begin{matrix}\begin{matrix}{{{Var}\left\{ X_{s} \right\}} \approx {{\overset{\_}{X}}_{s}^{2}\left\lbrack {\frac{\sigma_{x}^{2}}{{\overset{\_}{d}}_{x}^{2}} - \frac{\sigma_{x}^{4}}{{\overset{\_}{d}}_{x}^{4}}} \right\rbrack} \approx {{\overset{\_}{X}}_{s}^{4}\frac{\sigma_{x}^{2}}{B_{s}^{2}f_{x}^{2}}}} \\{{{Var}\left\{ Y_{s} \right\}} \approx {{\overset{\_}{Y}}_{s}^{2}\left\lbrack {\frac{\sigma_{x}^{2}}{{\overset{\_}{d}}_{x}^{2}} - \frac{\sigma_{x}^{4}}{{\overset{\_}{d}}_{x}^{4}}} \right\rbrack} \approx {{\overset{\_}{Y}}_{s}^{4}\frac{\sigma_{x}^{2}}{B_{s}^{2}f_{x}^{2}}}} \\{{{Var}\left\{ Z_{s} \right\}} \approx {{\overset{\_}{Z}}_{s}^{2}\left\lbrack {\frac{\sigma_{x}^{2}}{{\overset{\_}{d}}_{x}^{2}} - \frac{\sigma_{x}^{4}}{{\overset{\_}{d}}_{x}^{4}}} \right\rbrack} \approx {{\overset{\_}{Z}}_{s}^{4}\frac{\sigma_{x}^{2}}{B_{s}^{2}f_{x}^{2}}}}\end{matrix} & \left( {{Eq}.\mspace{14mu} 15} \right)\end{matrix}$

where symbols with a bar (e.g., Z _(s), d _(x)) represent the truevalues.

From this and the plots based on simulation of exact camera model, wecan conclude that the uncertainty (standard deviations) of the X/Y/Zestimate of a 3D point is proportional to the uncertainty of imagematching, the inverse of image resolution, and the square of the truevalue of X/Y/Z. The plot in FIG. 12E is based on simulation of exactmodel and Gaussian random matching error to confirm the conclusion thatthe estimate uncertainty of a 3D point is proportional to theuncertainty of image matching.

The plot in FIG. 12C is based on simulation of image digitization error[−0.5, 0.5] pixel and correct image matching to confirm the conclusionthat the estimate uncertainty of a 3D point is proportional to theinverse of image resolution. The plot in FIG. 12D is based on simulationof varying depth to confirm the conclusion that the estimate uncertaintyof a 3D point is proportional to the square of the true value of X/Y/Z:

View-geometry statistical analysis can also be applied to the mostgeneral case of camera set up, for example, a non-parallel set up. Itcan also be applied to the case of using 2D images rather than 3Dstereo-reconstructed points as observations. In the following, we give asimple example. More specifically, let's apply the following projectiveequations

x _(i)=ψ(K[R _(C) |−C]X _(i)),  (Eq. 16)

for both left and right cameras. ψ is the perspective projection (Eq.1A) and K is the 3×3 camera intrinsic parameter matrix, while R_(C)(3×3) and C (3×1) represent the camera orientation and position, theextrinsic parameters. Overall, we can use a 3×4 matrix A to representK[R_(C)|−C]. For left camera and right camera, we use A^(l) and A^(r)respectively. Hence the stereo problem becomes solving the followingarray of equations:

x _(i) ^(l)=ψ(A ^(l) X _(i)),

x _(i) ^(r)=ψ(A ^(r) X _(i))  (Eq. 17)

We can linearize this array of equations to be the following with Bbeing the 2×3 Jacobian matrix and {tilde over (x)} being themean-subtracted variable as follows;

{tilde over (x)} _(i) ^(l) ≈B ^(l) X _(i),

{tilde over (x)} _(i) ^(r) ≈B ^(r) X _(i)  (Eq. 18))

From Eq. 18, we can compute the Jacobian matrix required for computingthe observation covariance matrix for EKF in the case of using 2D imagesas observations as follows:

$J_{i} = \begin{bmatrix}B^{l} \\B^{r}\end{bmatrix}^{+}$

where [ ]⁺ represents a pseudo inverse.

Sequence Matching by Exploring Kinematics Constraints

Referring now to FIGS. 13A-13B, the concept of sequence matching is nowdescribed. A technique is chosen for matching a pair of image features1301 and 1302 such as from left and right images respectively in astereo setting or a real image and a synthesized image. The pair ofimages features may be matched over one or more sequences of images1301A-1301E and 1302A-1302E, for example.

A rigid sequence matching (where the relative kinematics within eachsequence are perfectly known or identical across sequences in the idealcase) may be employed where just one common motion parameter is used toestimate for all pairs between two sequences such as illustrated in FIG.13A. The sequences 1301 and 1302 have identical motion relationship(relative kinematics) among the images 1301A to 1301E, and 1302A to1302E. Alternatively, a flexible sequence matching (where the relativekinematics within each sequence are known with errors) may be employedas illustrated in FIG. 13B. The sequences 1303 and 1302 have a identicalmotion relationship (relative kinematics) among the images 1303A to1303E, and 1302A to 1302E.

To match a single pair of features 1301C and 1302C in a single pairimages of a sequence, there may be only a 60% chance or probability ofmatching the pair of image features correctly, for example. This may bean acceptable for a number of applications but typically would beunacceptable for medical or surgical applications. The probability ofmatching the image features should be much higher for medical andsurgical applications that demand high accuracy.

If two or more two temporal sequences 1301A-1301E and 1302A-1302E ofimages are available for matching and the kinematics among images withineach sequence is known or identical across sequences, the probability ofmatching image features can be improved over that of a single pair ofimages. Assuming statistical independence and that the chance orprobability of matching a single pair of image features correctly is 60%the chance of having correct matching improves to 78% with a sequence of3 images, 92% with a sequence of 5 images, and 99% with a sequence ofjust 10 images, for example. However, if the relative kinematics foreach pair of images in the sequence is not accurate, the chance ofhaving correct matches is less and a bundle adjustment procedure shouldbe taken for improved matching. Within the bundle adjustment procedure,the relative kinematics and its uncertainty are taken intoconsideration.

In one example of applying sequence matching where there are twosequences of observations (e.g., image feature points) O₁={F_(1,1) . . ., F_(1,n)} and O₂={F_(2,1), . . . , F_(2,n)}, the problem is to find outthe geometric transform T_(s) between the two sequences through sequencematching. In matching individual pairs F_(1,i) and F_(2,i), a geometrictransform T_(s,i):F_(1,i)→F_(2,i) may be computed. In other words, wehave the following equation for individual matching

$\begin{matrix}{T_{s,i}^{*} = {\arg \mspace{14mu} {\min\limits_{T_{s,i}}\mspace{14mu} {C\left\{ {{F_{1,i}\left( {T_{s,i}(X)} \right)},{F_{2,i}(X)}} \right\}}}}} & \left( {{Eq}.\mspace{14mu} 19} \right)\end{matrix}$

where C is the cost function of matching features F_(1,i) and F_(2,i)and X represents the location of the features in either two or threedimensions. For example, the geometric transformation Y_(s,i) may be in3D and the features may be in 2D after known camera projection. Ifrelative kinematics {dot over (k)}_(1/2,i) within each sequence areperfectly known, then the geometric transformation equation for sequencematching may become

$\begin{matrix}{T_{s}^{*} = {\arg \mspace{11mu} {\min\limits_{T_{S}}\mspace{14mu} {C\left\{ {{F_{1,1}\left( {T_{s}^{{\overset{.}{k}}_{1,1}}(X)} \right)},{{F_{2,1}\left( {{\overset{.}{k}}_{2,1}(X)} \right)};\ldots \mspace{11mu};{F_{1,n}\left( {T_{s}^{{\overset{.}{k}}_{1,n}}(X)} \right)}},{F_{2,n}\left( {{\overset{.}{k}}_{2,n}(X)} \right)}} \right\}}}}} & \left( {{Eq}.\mspace{14mu} 20} \right)\end{matrix}$

where T_(s) ^(k) ^(1,1) represents transformation from T_(s), typicallya chosen T_(s,i), through relative known kinematics.

In the geometric transformation (Eq. 20), the same amount of availableinformation is used to estimate just one parameter T_(s) as it is toestimate a series of parameters T_(s,j) (Eq. 19). Consequently, the useof sequence of images and relative kinematics provides a result that ismuch more accurate and robust. Sequence matching is more accurate androbust with as many diverse images in the sequence as possible. If thesequence of images are all the same, then the assumption of statisticalindependence per matching is false. As a result, the matchingperformance with a sequence of images will not be improved. For example,in the tool acquisition operational stage 604 illustrated in FIG. 6A,diversified images sequences may be generated by translating and/orrotating the tools within the field of view of the camera.

Sequence matching methods may be applied in many different scenarios,e.g., a single view case (a sequence of real images against a sequenceof synthesized images) or a stereo case (matching among two sequences ofreal images and two sequence of synthesized images). However, relativekinematics may not be perfect and may change over time. Hence, bundleadjustment based flexible sequence matching may be applied to estimatethe optimal parameters to more accurately locate and track tools.

Tool Tracking for Image Guided Surgery

A tool tracking system for a robotic instrument has a number ofapplications. One application for a tool tracking system is image-guidedsurgery (IGS), or more specifically image-guided endoscopic surgery(IGES). The basic goal of image guided surgery is to enhance a surgeon'sexperience by providing real time information derived from single ormultiple imaging modalities (e.g., visual, x-ray, computerizedtopography (CT), magnetic resonance imaging (MRI), ultrasound) duringsurgery or training/simulation. Two particular benefits of IGS/IGESare 1) improved visualization for easier on-line diagnostics and 2)improved localization for reliable and precise surgery. Tool tracking isone technology used for IGS/IGES since instruments are used by surgeonsto navigate, sense and operate (e.g., diagnostic, cut, suture, ablationetc.) in the areas of interest. That is, the tool tracking systemdescribed herein can enable image-guided surgery without significantadded operational inconvenience and/or added equipment. The tooltracking system described herein may also be used in other applications,such as dynamically reconstructing the geometry of organs, surgicalsimulation, and training.

Tool tracking may be used to provide automated camera control andguidance to maintain a robotic instrument in the field of view. Tooltracking can also be used to assist the surgeon to move the roboticinstrument to reach a tumor either automatically or with a surgeon'sassistance. Ultra-sound or pre-scanned images can also be used alongwith real-time tool tracking. Other applications of tool trackinginclude graphic user interface that facilities the entrance andre-entrance of the robotic instrument during surgery.

Tool tracking can be used to take a number of measurements duringsurgery as well. For example, tool tracking may be used to measure organsizes by touching the robotic tool tip at different points of an organ.A pair of robotic tools being tracked can concurrent touch points of theorgan and a distance along a line between their tips can be accuratelymeasured with the assistance of tool tracking. Additionally, tooltracking may be used to construct a 3D model of an organ. The tip of asingle robotic tool may be used to touch points across the organ'ssurface to construct a 3D model of the organ.

Tool tracking can be used to align different image modalities together.Referring now to FIG. 17, a perspective view of a surgical site 1700includes a robotic ultrasound tool 1710. The robotic ultrasound (US)tool 1710 has an attached or integrated ultrasound transducer 1710A. Therobotic ultrasound (US) tool 1710 may be used to guide and navigateother instruments to perform various medical or surgical procedures. Bytouching tissue 1705 in the surgical site with the robotic ultrasoundtool 1710, two dimensional ultrasound images 1711A may be captured in atwo dimensional coordinate system 1703. The two dimensional ultrasoundimages 1711A may be translated from the two dimensional coordinatesystem 1703 into a camera coordinate system 1701. The translatedultrasound images may then be overlaid onto video images of the surgicalsite 1700 displayed by the stereo viewer 312, such as illustrated by thetranslated ultrasound images 1711B-1711D in the surgical site 1700illustrated in FIG. 17. Tool tracking may be used to flagpole ultrasoundby (1) determining the transformation of the ultrasound images 1711Afrom the two dimensional coordinate system 1703 to the local ultrasoundcoordinate system 1702 in response to ultrasound calibration; (2) at thetransducer 1710A, determining the transformation from the ultrasoundtransducer coordinate system 1702 to the camera coordinate system 1701by using tool tracking; and then; (3) cascading the transformationstogether to overlay the ultrasound image in the camera coordinate system1701 onto the surgical site as illustrated by image 1711B. The qualityof image overlaying depends on tracking accuracy of the roboticultrasound tool within a camera coordinate system (FIG. 5A) and theultrasound calibration. Ultrasound calibration is described in thereference “A novel closed form solution for ultrasound calibration,”Boctor, E.; Viswanathan, A.; Choti, M.; Taylor, R. H.; Fichtinger, G.;Hager, G., Biomedical Imaging: Nano to Macro, 2004. IEEE InternationalSymposium on Volume, Issue, 15-18 Apr. 2004 Page(s): 527-530 Vol. 1.Tool tracking may also be used to generate 3D/volumetric ultrasoundimages by stacking overlaid 2D images 1711B-D from the 2D ultrasoundtransducer 1710A generated by rotating the ultrasound tool 1701 asillustrated by the arrow 1720.

As shown in FIG. 16, tool tracking may be used overlay one or more dropvirtual point/marks 1650A-1650B on images of the tissue surface 1600 inthe surgical site by using one or multiple tools 1610L, 1610R (forexample, a surgical tool or an ultrasound tool) to touch point ofinterest. For example, in telestration operation, teaching surgeons canuse tools to draw virtual marks to illustrate areas of interest toremote student surgeons on an external display. Another example is thatsurgeon can use one type of tracked tool (e.g., an ultrasound tool) todraw marks to indicate regions of interest and then use a different typeof tracked tool (e.g., a cautery tool) to operate or perform a surgicalor other medical procedure in selected regions of interest.

The tool tracking system described herein may also be used forimage-guided interventional radiology (IGIR) along with other sensors.For example, active sensors/cameras (e.g. electro-magnetic sensor, oractive near Infra-Red illumination plus stereo camera) may be used toscan patient bodies for navigation or 3D reconstruction with tooltracking during surgery.

Referring now to FIG. 15, a flow chart 1500 of an IGS application withtool tracking is illustrated. The flow chart 1500 illustrates a roboticsurgery for a tumor in or around a liver. However, IGS with tooltracking may be used for other medical and surgical procedures fordifferent organs and tissue that are robotically controlled orrobotically assisted.

At block 1501A, a liver or other organ/tissue prior to surgery may bescanned with a computer tomography scanner 1503 to obtain a number ofimages of the liver such that they may be used to reconstruct a complete3D volume of a scanned object at block 1504 by a computer as desired.There may be a slight computer tomography error E_(CT) for 3Dreconstruction that may be minimized.

At block 1505, a computer tomography volume that includes liver andsurrounding area is selected to generate a computer tomographic (CT)segment that contains liver only. At this point, the CT volume segmentis taken to register against a surface/depth map from other imagingmodalities, e.g., stereo cameras.

The real liver during surgery 1501B may have some biological variationforming an error (E_(BIO)) from the prior scan taken by the CT scanner.

One or more robotic instrument tips 1510 are inserted into the surgicalsite.

One or more endoscopic cameras 1512 take sequences of images 1513-1514of the surgical site including the liver 1501B and the roboticinstrument tips 1501. These sequences of images 1513-1514 are typicallystereo image pairs but could be a sequence of single images from amono-view. The sequence of images 1513 may be used to determine thedepth of surface features to make a surface/depth map 1515. In oneembodiment of the invention, a sequence of surface maps 1515 including arobotic instrument may be analyzed similar to a sequence of images 1513as described herein to determine a location of the robotic instrument.

The surface map 1515 may be overlaid with a model of an organ aftersurface registration. The surface map 1515 may be further annotated witha map, outline, or other indication of the estimated tumor location1517. For example an internal liver tumor that is not visually visiblemaybe easily visible in a CT scan of the liver. An error between theestimated tumor location and the actual tumor location during surgerymay be a sum of errors of the scanning (Ect), surface registration(Ereg), biological changes (Ebio), and formation of the depth/surfacemap (Est). This error can be reduced with further information from tooltracking and touching a tool tip to the tissue.

The sequences of images 1514 may be stereo image pairs or a sequence ofsingle images from a mono-view. The sequence of images 1514 along withkinematics information may be adaptively fused together to translate amodel of the tool tip (M_(ET)) into the endoscopic camera frame ofreference (ECM). The model of the tool tip formed from the adaptivefusion based tool tracking may be used to estimate the tool tip location1518. An estimation error (E_(ET)) between the actual tip location 1520and the estimated tip location 1518 may be made small by the tooltracking methods described herein. By using the tumor location 1550 andtool tip location 1520 during surgery, we can drive the tools to touchtheir tips to reach the tumor, one example of image-guided surgery. Oneor more of the robotic surgical tools may include a needle at its tip asthe end effector. The operational error E_(OP) between the actuallocation of the tumor and the actual location of tool tips is a sum oferrors.

Referring now to FIG. 16, a perspective view of a surgical site toprovide image guided surgery is illustrated. A pre-scanned image 1602A(for example, CT images) may be aligned to a camera coordinate system1605 and then overlaid as an overlaid image 1602B onto a surface map ordepth map 1600 of the tissue surface.

Visual information and kinematics information of the first roboticinstrument 1610L and the second robotic instrument 1610R may beadaptively fused together to more accurately determine the position ofthe tools within the surgical site. With both position of the overlaidimage 1602B and the tools 1610L,1610R aligned to the camera coordinatesystem 1605, the tool may be automatically or manually guided to performa procedure on the tissue. The overlaid image 1602B may represent atumor on the tissue surface or buried below the tissue surface thatwould otherwise be hidden from view. As portions of the tool 1610L or1610R are moved below the tissue surface so that it is occluded, animage 1610B of the portion of the tool below the surface may besynthesized in response to the tool tracking information and theadaptive fusion of kinematics with a priori video information of thetool.

The one or more robotic surgical tools 16101, and 1610R may be used totake measurements or determine a surface profile. Tissue in a surgicalsite may be touched with the tool tip of the robotic tool 1610L at afirst point and tissue in the surgical site may be touched with the tooltip of the robotic tool 1610R at a second point. A sequence of images ofthe surgical site including the robotic surgical tools 1610L and 1610Rmay be captured. Kinematics information and image information of therobotic tools may be adaptively fused together to accurately determinethe tool tip locations at the first and second points. The pose of thefirst tool tip at the first point and the pose of the second tool tip atthe second point may be compared to determine a distance between them.The tool tips may be touching external surfaces of a tumor to determinea diameter of the tumor. Alternatively, the tool tips may be touchingexternal surfaces of an organ to measure the diameter of the organ.

Within the presented framework for adaptive fusion of vision andkinematics for tool tracking, we can also incorporate the depth maps.

In one embodiment of the invention, we use depth maps for localizingtools. A sequence of depth maps including a robotic instrument may beanalyzed to determine a location of robotic instruments. Roboticinstruments can be located in the camera coordinate frame using anydepth map in which the tool can be identified. Kinematics datum providesan approximate location for a robotic instrument that is to be tracked.This is a priori knowledge for the next iteration of the trackingproblem. The depth map is analyzed in the environs of the approximatelocation to locate the robotic instrument. Other information may beemployed to improve the a priori knowledge, such as a dynamic model ofthe robotic instrument, or knowledge of the type of procedure beingperformed under the camera capturing the images. The locating of therobotic instrument may be performed over any number of depth maps.

If a correlation exists between sequential views, the problem ofsequential location of a target robotic surgical tool is a tool trackingproblem. The depth maps may be sequential views, arranged in time order,in which case there is a correlation between the successive views.

Kinematics datum provides an approximate location for a roboticinstrument that is to be tracked. This is a priori knowledge for thenext iteration of the tracking problem. The depth map is analyzed in theenvirons of the approximate location to locate the robotic instrument.Other information may be employed to improve the a priori knowledge,such as a dynamic model of the robotic instrument, or knowledge of thetype of procedure being performed under the camera capturing the images.If the robotic tool is obscured, a current optimal estimate of thelocation of the surgical instrument may be made (an a posterioriestimate) using the a priori knowledge and the depth map. Instantaneous(re-)correction of the kinematics datum may be computed by adaptivelyfusing together the available kinematics data, visual information,and/or a priori information. The correction to the current state is usedto update the ongoing correction of future kinematics data. In oneembodiment of the invention, the correction is simply made to futuredata without regard for past corrections. In another embodiment of theinvention, the sequence of corrections is analyzed and an optimalcorrection based on all available past corrections is computed and usedto correct the kinematics data. Analysis by synthesis and appearancelearning techniques may be used to improve the correction to the currentstate.

Algorithms that locate the surgical instrument in the depth map andprovide the optimal kinematic correction, can be further optimized byunderstanding of the relative variances in corrected kinematic residualerror vs. the variances in computed robotic instrument location from thedepth map. Kinematics is suspected of initially having a large DC bias,but relatively small variance. A well-design image processing subsystemmay have substantially zero DC bias but a relatively large variance. Anexample of an optimal correction that accommodates these two differingnoise processes is one generated by a Kalman filter.

Tool Tracking User Interface

Previously described herein with reference to FIG. 16, if portions ofthe tool 1610L or 1610R are moved below the tissue surface so that it isoccluded, an image 1610B of the portion of the tool below the surfacemay be synthesized. The synthesized image portion 1610B may be includedin the tool images 400L,400R as synthesized image portion 400B displayedon the display devices 402L,402R of the stereo viewer 312 illustrated inFIG. 4.

Referring now to FIG. 5B, while tracking a robotic surgical tool, it mayexit the field of view 510 of a camera entirely such, as illustrated bythe tool 101F. The robotic surgical tool may no longer be in the fieldof view as a result of camera movement over the surgical site, roboticsurgical tool movement, or a combination of both. For example, thecamera may move away from the position of the robotic surgical tool in asurgical site such that the robotic surgical tool is outside the fieldof view of the camera. As another example, the robotic surgical tool maymove away from a position of the camera in a surgical site such that therobotic surgical tool is outside the field of view of the camera. Ineither case, a surgeon may be left guessing where the robotic surgicaltool is outside the field of view unless some indication is provided tohim in the stereo viewer 312.

In FIG. 4, compass icons or a compass rose (generally referred to withreference number 420) may be displayed in the display devices of thestereo viewer 312 to provide an indication where the robotic surgicaltool is located outside the field of view and a direction of toolreentrance into the field of view. For example one of a plurality ofcompass icons for the directions North 420N, South 420S, East 420E, andWest 420W as well as directions in between such as North-East 420NE,North-West 420NW, South-East 420SE, and South-West 420SW may beindicated in the stereo viewer to indicate tool reentrance into thefield of view of the camera over a surgical site.

To indicate tool reentrance into the field of view a robotic surgicaltool is tracked in and out of the field of view of the camera. Adetermination is made by comparing positions of the camera and a toolwhether or not it is outside the field of view of the camera by usingthe available tool tracking information of the robotic surgical tool. Itthe robotic surgical tool is out of the field of view, then one of aplurality of compass icons 420 may be displayed in the field of view toshow a direction of tool reentrance.

Additional information, such as getting closer or farther in distanceaway from the field of view, may be conveyed to a surgeon looking in thestereo viewer 312 by somewhat altering the one compass icon 420 in thedisplay that indicates the direction of the tool. For example, byflashing the icon fast or slow may indicate the robotic surgical tool isgetting closer or farther away respectively from the field of view. Asanother example, arrowheads may be added to ends of a bar icon toindicate that the robotic surgical tool is move towards or away from thefield of view. In an alternate implementation, the icon indicatingdirection or reentrance may be colored to indicate movement bringing thetool closer to the field of view (e.g., red for getting warmer) ormovement taking the tool further away from the field of view (e.g.,green for getting colder).

CONCLUSION

The embodiments of the tool tracking system described herein provide anautomatic integrated system that is accurate and reliable by adaptivelyfusing kinematics and visual information, synthesizing images based on amodel and prior poses, and employing sequence matching.

A number of elements of the tool tracking system are implemented insoftware and executed by a computer and its processor, such as computer151 and its processor 302. When implemented in software, the elements ofthe embodiments of the invention are essentially the code segments toperform the necessary tasks. The program or code segments can be storedin a processor readable medium or transmitted by a computer data signalembodied in a carrier wave over a transmission medium or communicationlink. The processor readable medium may include any medium that canstore or transfer information. Examples of the processor readable mediuminclude an electronic circuit, a semiconductor memory device, a readonly memory (ROM), a flash memory, an erasable programmable read onlymemory (EPROM), a floppy diskette, a CD-ROM, an optical disk, a harddisk, a fiber optic medium, a radio frequency (RF) link, etc. Thecomputer data signal may include any signal that can propagate over atransmission medium such as electronic network channels, optical fibers,air, electromagnetic, RF links, etc. The code segments may be downloadedvia computer networks such as the Internet, Intranet, etc.

While certain exemplary embodiments have been described and shown in theaccompanying drawings, it is to be understood that such embodiments aremerely illustrative of and not restrictive on the broad invention, andthat the embodiments of the invention not be limited to the specificconstructions and arrangements shown and described, since various othermodifications may occur to those ordinarily skilled in the art. Forexample, some embodiments of the invention have been described withreference to a robotic surgical system. However, these embodiments areequally applicable to other robotic systems. Thus, the embodiments ofthe invention should be construed according to the claims that followbelow.

1-25. (canceled)
 26. A robotic system comprising: a first robotic toolhaving a first tool tip; a second robotic tool having a second tool tip;and a processor programmed to: fuse kinematics derived positioninformation and image derived position information of the first robotictool in a state-space model by using a tool tracking system to estimatea first tool tip location while the first tool tip is touching tissue ata first point; fuse kinematics derived position information and imagederived position information of the second robotic tool in a state-spacemodel by using the tool tracking system to estimate a second tool tiplocation while the second tool tip is touching the tissue at a secondpoint; and compare the first tool tip location and the second tool tiplocation by using the tool tracking system to determine a distancebetween the first and second points on the tissue.
 27. The roboticsystem of claim 26, wherein the tissue comprises a tumor; the firstpoint is on a first external surface of the tumor; the second point ison a second external surface of the tumor; and the comparing determinesa diameter of the tumor.
 28. The robotic system of claim 26, wherein thetissue comprises an organ; the first point is on a first externalsurface of the organ; the second point is on a second external surfaceof the organ; and the comparing determines a diameter of the organ. 29.The robotic system of claim 26, wherein the processor is programmed to:process sensor data of the first robotic tool by using the tool trackingsystem to generate the kinematics derived position information of thefirst robotic tool; and process sensor data of the second robotic toolby using the tool tracking system to generate the kinematics derivedposition information of the second robotic tool.
 30. The robotic systemof claim 26, further comprising: an image capture device for capturing asequence of images of the first robotic tool and the second robotic toolat a surgical site; wherein the processor is programmed to process thesequence of images of the first robotic tool and the second robotic toolat the surgical site by using the tool tracking system to generate theimage derived position information of the first and second robotictools.
 31. The robotic system of claim 30, wherein the processor isprogrammed to process the sequence of images of the first robotic tooland the second robotic tool at the surgical site by using animage-by-synthesis approach wherein synthesized images of computermodels of the first and second robotic tools are compared against imagesof the first and second robotic tools in the sequence of images.
 32. Therobotic system of claim 30, wherein the processor is programmed toprocess the sequence of images of the first robotic tool and the secondrobotic tool at the surgical site by using a feature matching approachwherein features of the first and second robotic tools are identified inthe sequence of images.
 33. The robotic system of claim 30, wherein theprocessor is programmed to process the sequence of images of the firstrobotic tool and the second robotic tool at the surgical site by using asequence matching approach wherein objects or features in a sequence ofimages captured from a view of the image capturing device are matchedagainst objects or features in a sequence of images captured from adifferent view of the image capturing device.
 34. The robotic system ofclaim 30, wherein the processor is programmed to process the sequence ofimages of the first robotic tool and the second robotic tool at thesurgical site by using a sequence matching approach wherein objects orfeatures in a sequence of images captured from a view of the imagecapturing device are matched against objects or features in a sequenceof synthesized images.
 35. The robotic system of claim 26, wherein theprocessor is programmed to fuse the kinematics derived positioninformation and the image derived position information of the firstrobotic tool in the state-space model by using a sequential Bayesianapproach; and fuse the kinematics derived position information and theimage derived position information of the second robotic tool in thestate-space model by using the sequential Bayesian approach.
 36. Amethod for tracking movement of a robotic instrument, the methodcomprising: receiving images of video frames from at least one camera;receiving kinematics information related to robotic movement of therobotic instrument; determining mechanical pose information from thekinematics information; synthesizing model pose information of acomputer aided design model of the robotic instrument using themechanical pose information; determining video pose information of therobotic instrument by using the synthesized model pose information as apattern for pattern searching within the images; and providing astate-space model of a sequence of states of corrected kinematicsinformation for accurate pose information of the robotic instrument, thestate-space model to receive raw kinematics information of mechanicalpose information and to adaptively fuse the mechanical pose informationand the video pose information together to generate the sequence ofstates of the corrected kinematics information for the roboticinstrument.
 37. The method of claim 36, wherein the synthesized modelpose information of the model of the robotic instrument includes one ormore markers of the robotic instrument forming a pattern.
 38. The methodof claim 37, wherein the one or more markers include artificial markersconsisting of a pattern of dots.
 39. The method of claim 37, wherein theone or more markers include natural markers represented by consisting ofgeometry information of the computer aided design model.
 40. A methodfor tracking movement of a robotic instrument, the method comprising:receiving images of video frames from at least one camera; determiningvideo pose information of the robotic instrument within the images;estimating uncertainty in the determined video pose information in lightof video information in response to view geometry statistics; receivingkinematics information related to robotic movement of the roboticinstrument; determining mechanical pose information from the kinematicsinformation; and fusing the mechanical pose information and the videopose information in a state-space model using a covariance matrixconfigured to compensate for the estimated uncertainty in the video poseinformation to generate estimated pose information for the roboticinstrument.
 41. The method of claim 40, wherein the state-space modelincludes a dynamic model and an observation model respectively includingdynamic noise and observation noise that respectively have dynamic andobservation Gaussian distributions respectively characterized by dynamicand observation covariance matrices, wherein the observation covariancematrix includes a sub-matrix for vision; and wherein the method furthercomprises: configuring the sub-matrix for vision to compensate for theestimated uncertainty of video information by modifying elements of thesub-matrix so as to adjust a standard deviation for the estimated poseinformation for the robotic instrument accordingly.
 42. The method ofclaim 40, wherein assuming an independent Gaussian noise model, the viewgeometry statistics are computed for one or more of digitizationerror/image resolution, feature/algorithm related image matching error,distance from object to camera, angle between object surface normal andline of sight, illumination, and specularity.
 43. A robotic systemcomprising: a camera; a robotic instrument; and one or more processorsprogrammed so as to cooperatively perform the following tasks: receiveimages of video frames from the camera; receive kinematics informationrelated to robotic movement of the robotic instrument; determinemechanical pose information from the kinematics information; synthesizemodel pose information of a computer aided design model of the roboticinstrument using the mechanical pose information; determine video poseinformation of the robotic instrument by using the synthesized modelpose information as a pattern for pattern searching within the images;and provide a state-space model of a sequence of states of correctedkinematics information for accurate pose information of the roboticinstrument, the state-space model to receive raw kinematics informationof mechanical pose information and to adaptively fuse the mechanicalpose information and the video pose information together to generate thesequence of states of the corrected kinematics information for therobotic instrument.
 44. The robotic system of claim 43, wherein thesynthesized model pose information of the model of the roboticinstrument includes one or more markers of the robotic instrumentforming a pattern.
 45. The robotic system of claim 44, wherein the oneor more markers include artificial markers consisting of a pattern ofdots.
 46. The robotic system of claim 44, wherein the one or moremarkers include natural markers represented by consisting of geometryinformation of the computer aided design model.
 47. A robotic systemcomprising: a camera; a robotic instrument; and one or more processorsprogrammed so as to cooperatively perform the following tasks: receiveimages of video frames from the camera; determine video pose informationof the robotic instrument within the images; estimate uncertainty in thedetermined video pose information in light of video information inresponse to view geometry statistics; receive kinematics informationrelated to robotic movement of the robotic instrument; determinemechanical pose information from the kinematics information; and fusethe mechanical pose information and the video pose information in astate-space model using a covariance matrix configured to compensate forthe estimated uncertainty in the video pose information to generateestimated pose information for the robotic instrument.
 48. The roboticsystem of claim 47, wherein the state-space model includes a dynamicmodel and an observation model respectively including dynamic noise andobservation noise that respectively have dynamic and observationGaussian distributions respectively characterized by dynamic andobservation covariance matrices, wherein the observation covariancematrix includes a sub-matrix for vision; and wherein one of the one ormore processors is programmed to: configure the sub-matrix for vision tocompensate for the estimated uncertainty of video information bymodifying elements of the sub-matrix so as to adjust a standarddeviation for the estimated pose information for the robotic instrumentaccordingly.
 49. The robotic system of claim 47, wherein assuming anindependent Gaussian noise model, and the view geometry statistics arecomputed for one or more of digitization error/image resolution,feature/algorithm related image matching error, distance from object tocamera, angle between object surface normal and line of sight,illumination, and specularity.